22 Jul, 2013

1 commit

  • The glibc calloc() function has an optimization to not explicitely
    memset() very large calloc allocations that just came from mmap(),
    because they are known to be zero.

    This could result in the perf memcpy benchmark reading only from
    the zero page, which gives unrealistic results.

    Always call memset explicitly on the source area to avoid this problem.

    Signed-off-by: Andi Kleen
    Cc: Hitoshi Mitake
    Cc: Kirill A. Shutemov
    Link: http://lkml.kernel.org/n/tip-pzz2qrdq9eymxda0y8yxdn33@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     

09 Jul, 2013

1 commit

  • Addresses of allocated memory areas saved to '*src' and '*dst', so we
    need to check them for NULL, not 'src' and 'dst'.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Hitoshi Mitake
    Cc: Hitoshi Mitake
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1370518503-4230-1-git-send-email-kirill.shutemov@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Kirill A. Shutemov
     

14 Mar, 2013

1 commit

  • The tokens MADV_HUGEPAGE and MADV_NOHUGEPAGE are not available with
    glibc 2.12 and older. Define these tokens if they are not already
    defined.

    This patch fixes these build errors with older versions of glibc.

    CC bench/numa.o
    bench/numa.c: In function ‘alloc_data’:
    bench/numa.c:334: error: ‘MADV_HUGEPAGE’ undeclared (first use in this function)
    bench/numa.c:334: error: (Each undeclared identifier is reported only once
    bench/numa.c:334: error: for each function it appears in.)
    bench/numa.c:341: error: ‘MADV_NOHUGEPAGE’ undeclared (first use in this function)
    make: *** [bench/numa.o] Error 1

    Signed-off-by: Vinson Lee
    Acked-by: Ingo Molnar
    Cc: Ingo Molnar
    Cc: Irina Tirdea
    Cc: Paul Mackerras
    Cc: Pekka Enberg
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1363214064-4671-2-git-send-email-vlee@twitter.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Vinson Lee
     

30 Jan, 2013

1 commit

  • Add a suite of NUMA performance benchmarks.

    The goal was simulate the behavior and access patterns of real NUMA
    workloads, via a wide range of parameters, so this tool goes well
    beyond simple bzero() measurements that most NUMA micro-benchmarks use:

    - It processes the data and creates a chain of data dependencies,
    like a real workload would. Neither the compiler, nor the
    kernel (via KSM and other optimizations) nor the CPU can
    eliminate parts of the workload.

    - It randomizes the initial state and also randomizes the target
    addresses of the processing - it's not a simple forward scan
    of addresses.

    - It provides flexible options to set process, thread and memory
    relationship information: -G sets "global" memory shared between
    all test processes, -P sets "process" memory shared by all
    threads of a process and -T sets "thread" private memory.

    - There's a NUMA convergence monitoring and convergence latency
    measurement option via -c and -m.

    - Micro-sleeps and synchronization can be injected to provoke lock
    contention and scheduling, via the -u and -S options. This simulates
    IO and contention.

    - The -x option instructs the workload to 'perturb' itself artificially
    every N seconds, by moving to the first and last CPU of the system
    periodically. This way the stability of convergence equilibrium and
    the number of steps taken for the scheduler to reach equilibrium again
    can be measured.

    - The amount of work can be specified via the -l loop count, and/or
    via a -s seconds-timeout value.

    - CPU and node memory binding options, to test hard binding scenarios.
    THP can be turned on and off via madvise() calls.

    - Live reporting of convergence progress in an 'at glance' output format.
    Printing of convergence and deconvergence events.

    The 'perf bench numa mem -a' option will start an array of about 30
    individual tests that will each output such measurements:

    # Running 5x5-bw-thread, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp 1"
    5x5-bw-thread, 20.276, secs, runtime-max/thread
    5x5-bw-thread, 20.004, secs, runtime-min/thread
    5x5-bw-thread, 20.155, secs, runtime-avg/thread
    5x5-bw-thread, 0.671, %, spread-runtime/thread
    5x5-bw-thread, 21.153, GB, data/thread
    5x5-bw-thread, 528.818, GB, data-total
    5x5-bw-thread, 0.959, nsecs, runtime/byte/thread
    5x5-bw-thread, 1.043, GB/sec, thread-speed
    5x5-bw-thread, 26.081, GB/sec, total-speed

    See the help text and the code for more details.

    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Peter Zijlstra
    Cc: Andrea Arcangeli
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

11 Sep, 2012

1 commit

  • perf defines both __used and __unused variables to use for marking
    unused variables. The variable __used is defined to
    __attribute__((__unused__)), which contradicts the kernel definition to
    __attribute__((__used__)) for new gcc versions. On Android, __used is
    also defined in system headers and this leads to warnings like: warning:
    '__used__' attribute ignored

    __unused is not defined in the kernel and is not a standard definition.
    If __unused is included everywhere instead of __used, this leads to
    conflicts with glibc headers, since glibc has a variables with this name
    in its headers.

    The best approach is to use __maybe_unused, the definition used in the
    kernel for __attribute__((unused)). In this way there is only one
    definition in perf sources (instead of 2 definitions that point to the
    same thing: __used and __unused) and it works on both Linux and Android.
    This patch simply replaces all instances of __used and __unused with
    __maybe_unused.

    Signed-off-by: Irina Tirdea
    Acked-by: Pekka Enberg
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/1347315303-29906-7-git-send-email-irina.tirdea@intel.com
    [ committer note: fixed up conflict with a116e05 in builtin-sched.c ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Irina Tirdea
     

09 Sep, 2012

1 commit

  • When NDEBUG is defined, the assert macro will be expanded to nothing.
    Some assert calls used in perf are also including some functionality
    (e.g. system calls), not only validity checks. Therefore, if NDEBUG is
    defined, this functionality will be removed along with the assert. Perf
    also defines BUG_ON based on assert, so it has the same problem.

    Define BUG_ON so that the condition will be executed when NDEBUG is
    defined. Replace the assert statements that have these side effects
    with BUG_ON.

    For defining BUG_ON, use "if (cond) {}" insted of "if (cond) ;" because
    in the latter case build fails with "error: suggest braces around empty
    body in an ‘if’ statement [-Werror=empty-body]"

    Suggested-by: Peter Zijlstra
    Signed-off-by: Irina Tirdea
    Reviewed-by: Namhyung Kim
    Reviewed-by: Pekka Enberg
    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/1347082551-2394-1-git-send-email-irina.tirdea@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Irina Tirdea
     

03 Jul, 2012

1 commit

  • As Namhyung Kim pointed, there are confused namings and descriptions of words
    "cycle" and "clock" in mem-memset.c and mem-memcpy.c.

    With the option "-c" (or "--clock", now renamed as "--cycle"), mem subsystem
    measures cost of memset() and memcpy() with cpu-cycles event.

    But current mem subsystem source code contains lots of confused variable
    namings and descriptions with "clock" (e.g. the variable use_clock). This is a
    very bad style because there is another software event named "cpu-clock". This
    patch replaces wrong usage of "clock" to "cycle".

    v2: modified Documentation/perf-bench.txt for the descriptions of
    --cycle option

    Signed-off-by: Hitoshi Mitake
    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/1341236777-18457-1-git-send-email-h.mitake@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Hitoshi Mitake
     

28 Jun, 2012

1 commit

  • The current perf-bench documentation has a couple of typos and even
    lacks entire description of mem subsystem. Fix it.

    Reported-by: Ingo Molnar
    Signed-off-by: Namhyung Kim
    Acked-by: Hitoshi Mitake
    Cc: Hitoshi Mitake
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1340172486-17805-1-git-send-email-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

07 Feb, 2012

3 commits

  • By adding following objects:
    bench/mem-memset-x86-64-asm.o
    bench/mem-memcpy-x86-64-asm.o
    the x86_64 perf binary ended up with executable stack.

    The reason was that above objects are assembler sourced and are missing the
    GNU-stack note section. In such case the linker assumes that the final binary
    should not be restricted at all and mark the stack as RWX.

    Adding section ".note.GNU-stack" definition to mentioned objects, with all
    flags disabled, thus omiting those objects from linker stack flags decision.

    Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=783570
    Reported-by: Clark Williams
    Acked-by: Eric Dumazet
    Cc: Corey Ashford
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1328100848-5630-1-git-send-email-jolsa@redhat.com
    Signed-off-by: Jiri Olsa
    [ committer note: Remaining bits after what was already added to perf/urgent ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • So that we can get the perf bench exec stack fixes and then apply the
    remaining fix for the files added after what is in perf/urgent.

    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • By adding following objects:
    bench/mem-memcpy-x86-64-asm.o
    the x86_64 perf binary ended up with executable stack.

    The reason was that above object are assembler sourced and is missing the
    GNU-stack note section. In such case the linker assumes that the final binary
    should not be restricted at all and mark the stack as RWX.

    Adding section ".note.GNU-stack" definition to mentioned object, with all
    flags disabled, thus omiting this object from linker stack flags decision.

    Problem introduced in:

    $ git describe ea7872b
    v2.6.37-rc2-19-gea7872b

    Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=783570
    Reported-by: Clark Williams
    Acked-by: Eric Dumazet
    Cc: Corey Ashford
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: stable@kernel.org
    Link: http://lkml.kernel.org/r/1328100848-5630-1-git-send-email-jolsa@redhat.com
    Signed-off-by: Jiri Olsa
    [ committer note: Backported fix to perf/urgent (3.3-rc2+) ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

31 Jan, 2012

1 commit

  • There are unnecessary #include out there, and they might cause
    a nasty build failure in some environment. As we already have most of
    ctype macros in util.h, just get rid of them.

    A few of exceptions are util/symbol.c which needs isupper() macro util.h
    doesn't provide and perl scripting support code which includes ctype.h
    internally.

    Suggested-by: Ingo Molnar
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1327827356-8786-4-git-send-email-namhyung@gmail.com
    Signed-off-by: Namhyung Kim
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

25 Jan, 2012

4 commits

  • "perf stat ... perf bench mem mem..." is pretty meaningless when using
    small block sizes (as the overhead of the invocation of each test run
    basically hides the actual test result in the noise). Repeating the
    actually interesting function's invocation a number of times allows the
    results to become meaningful.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/4F16D767020000780006D738@nat28.tlf.novell.com
    Signed-off-by: Jan Beulich
    Signed-off-by: Arnaldo Carvalho de Melo

    Jan Beulich
     
  • This simply clones the respective memcpy() implementation.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/r/4F16D743020000780006D735@nat28.tlf.novell.com
    Signed-off-by: Jan Beulich
    Signed-off-by: Arnaldo Carvalho de Melo

    Jan Beulich
     
  • Intended to be able to support the current selection of the preferred
    memcpy() implementation, this patch adds the ability to also measure the
    two alternative implementations, again by way of using some
    pre-processsor replacement.

    While on my Westmere system this proves that the movsb based variant is
    worse than the movsq based one (since the ERMS feature isn't there), it
    also shows that here for the default as well as small sizes the unrolled
    variant outperforms the movsq one.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/4F16D728020000780006D732@nat28.tlf.novell.com
    Signed-off-by: Jan Beulich
    Signed-off-by: Arnaldo Carvalho de Melo

    Jan Beulich
     
  • Since arch/x86/lib/memcpy_64.S implements not only __memcpy, but also
    memcpy, without further precautions this function will get chose by the
    static linker for resolving all references, and hence the "default"
    measurement didn't really measure anything else than the
    "x86-64-unrolled" one.

    Fix this by renaming (through the pre-processor) the conflicting symbol.

    On my Westmere system, the glibc variant turns out to require about 4%
    less instructions, but 15% more cycles for the default 1Mb block size
    measured.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/4F16D6FD020000780006D72F@nat28.tlf.novell.com
    Signed-off-by: Jan Beulich
    Signed-off-by: Arnaldo Carvalho de Melo

    Jan Beulich
     

07 Feb, 2011

1 commit

  • GCC 4.6.0 in Fedora rawhide turned up some compile errors in tools/perf
    due to the -Werror=unused-but-set-variable flag.

    I've gone through and annotated some of the assignments that had side
    effects (ie: return value from a function) with the __used annotation,
    and in some cases, just removed unused code.

    In a few cases, we were assigning something useful, but not using it in
    later parts of the function.

    kyle@dreadnought:~/src% gcc --version
    gcc (GCC) 4.6.0 20110122 (Red Hat 4.6.0-0.3)

    Cc: Ingo Molnar
    LKML-Reference:
    Signed-off-by: Kyle McMartin
    [ committer note: Fixed up the annotation fixes, as that code moved recently ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Kyle McMartin
     

26 Nov, 2010

2 commits

  • …_64.S memcpy routines via 'perf bench mem'

    This patch ports arch/x86/lib/memcpy_64.S to perf bench mem
    memcpy for benchmarking memcpy() in userland with tricky and
    dirty way.

    util/include/asm/cpufeature.h, util/include/asm/dwarf2.h, and
    util/include/linux/linkage.h are mostly dummy files with small
    wrappers, so that we are able to include memcpy_64.S
    unmodified.

    Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
    Cc: h.mitake@gmail.com
    Cc: Miao Xie <miaox@cn.fujitsu.com>
    Cc: Ma Ling <ling.ma@intel.com>
    Cc: Zhao Yakui <yakui.zhao@intel.com>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Frederic Weisbecker <fweisbec@gmail.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Andi Kleen <andi@firstfloor.org>
    LKML-Reference: <1290668693-27068-2-git-send-email-mitake@dcl.info.waseda.ac.jp>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

    Hitoshi Mitake
     
  • After applying this patch, perf bench mem memcpy prints
    both of prefualted and without prefaulted score of memcpy().

    New options --no-prefault and --only-prefault are added
    to print single result, mainly for scripting usage.

    Usage example:

    | mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB
    | # Running mem/memcpy benchmark...
    | # Copying 500MB Bytes ...
    |
    | 634.969014 MB/Sec
    | 4.828062 GB/Sec (with prefault)
    | mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --only-prefault
    | # Running mem/memcpy benchmark...
    | # Copying 500MB Bytes ...
    |
    | 4.705192 GB/Sec (with prefault)
    | mitake@X201i:~/linux/.../tools/perf% ./perf bench mem memcpy -l 500MB --no-prefault
    | # Running mem/memcpy benchmark...
    | # Copying 500MB Bytes ...
    |
    | 642.725568 MB/Sec

    Signed-off-by: Hitoshi Mitake
    Cc: h.mitake@gmail.com
    Cc: Miao Xie
    Cc: Ma Ling
    Cc: Zhao Yakui
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Andi Kleen
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hitoshi Mitake
     

18 May, 2010

1 commit

  • To avoid problems like the one fixed by Stephane Eranian in 3de29ca, now
    we'll got this instead:

    bench/sched-messaging.c:259: error: negative width in bit-field ‘’
    bench/sched-messaging.c:261: error: negative width in bit-field ‘’

    Which is rather cryptic, but is how BUILD_BUG_ON_ZERO works, so kernel
    hackers should be already used to this.

    With it in place found some problems, fixed by changing the affected
    variables to sensible types or changed some OPT_INTEGER to OPT_UINTEGER.

    Next csets will go thru converting each of the remaining OPT_ so that
    review can be made easier by grouping changes per type per patch.

    Cc: Frédéric Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Tom Zanussi
    LKML-Reference:
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

14 Apr, 2010

1 commit

  • Parsing an option from the command line with OPT_BOOLEAN on a
    bool data type would not work on a big-endian machine due to the
    manner in which the boolean was being cast into an int and
    incremented. For example, running 'perf probe --list' on a
    PowerPC machine would fail to properly set the list_events bool
    and would therefore print out the usage information and
    terminate.

    This patch makes OPT_BOOLEAN work as expected with a bool
    datatype. For cases where the original OPT_BOOLEAN was
    intentionally being used to increment an int each time it was
    passed in on the command line, this patch introduces OPT_INCR
    with the old behaviour of OPT_BOOLEAN (the verbose variable is
    currently the only such example of this).

    I have reviewed every use of OPT_BOOLEAN to verify that a true
    C99 bool was passed. Where integers were used, I verified that
    they were only being used for boolean logic and changed them to
    bools to ensure that they would not be mistakenly used as ints.
    The major exception was the verbose variable which now uses
    OPT_INCR instead of OPT_BOOLEAN.

    Signed-off-by: Ian Munsie
    Acked-by: David S. Miller
    Cc: # NOTE: wont apply to .3[34].x cleanly, please backport
    Cc: Git development list
    Cc: Ian Munsie
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: KOSAKI Motohiro
    Cc: Hitoshi Mitake
    Cc: Rusty Russell
    Cc: Frederic Weisbecker
    Cc: Eric B Munson
    Cc: Valdis.Kletnieks@vt.edu
    Cc: WANG Cong
    Cc: Thiago Farina
    Cc: Masami Hiramatsu
    Cc: Xiao Guangrong
    Cc: Jaswinder Singh Rajput
    Cc: Arjan van de Ven
    Cc: OGAWA Hirofumi
    Cc: Mike Galbraith
    Cc: Tom Zanussi
    Cc: Anton Blanchard
    Cc: John Kacur
    Cc: Li Zefan
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ian Munsie
     

08 Apr, 2010

1 commit


03 Apr, 2010

1 commit


14 Dec, 2009

2 commits

  • Here, tvec->tv_usec is "unsigned int" not "unsigned long".

    Since the type is different on every platform, it's probably
    best to just use long printf formats and cast.

    Signed-off-by: David S. Miller
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    David Miller
     
  • This patch adds a new "all" pseudo subsystem and an "all" pseudo
    suite. These are for testing all subsystem and its all suite, or
    all suite of one subsystem.

    (This patch also contains a few trivial comment fixes for
    bench/* and output style fixes. I judged that there are no
    necessity to make them into individual patch.)

    Example of use:

    | % ./perf bench sched all # Test all suites of sched subsystem
    | # Running sched/messaging benchmark...
    | # 20 sender and receiver processes per group
    | # 10 groups == 400 processes run
    |
    | Total time: 0.414 [sec]
    |
    | # Running sched/pipe benchmark...
    | # Extecuted 1000000 pipe operations between two tasks
    |
    | Total time: 10.999 [sec]
    |
    | 10.999317 usecs/op
    | 90914 ops/sec
    |
    | % ./perf bench all # Test all suites of all subsystems
    | # Running sched/messaging benchmark...
    | # 20 sender and receiver processes per group
    | # 10 groups == 400 processes run
    |
    | Total time: 0.420 [sec]
    |
    | # Running sched/pipe benchmark...
    | # Extecuted 1000000 pipe operations between two tasks
    |
    | Total time: 11.741 [sec]
    |
    | 11.741346 usecs/op
    | 85169 ops/sec
    |
    | # Running mem/memcpy benchmark...
    | # Copying 1MB Bytes from 0x7ff33e920010 to 0x7ff3401ae010 ...
    |
    | 808.407437 MB/Sec

    Signed-off-by: Hitoshi Mitake
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hitoshi Mitake
     

24 Nov, 2009

1 commit


22 Nov, 2009

1 commit

  • mem-memcpy.c uses perf event system calls to obtain CPU clocks.
    And it suddenly dies with BUG_ON() when it running on Linux
    doesn't support perf event.

    Also fail at calloc() can occur easily when too large
    length is passed. Fail of calloc() causes sudden death
    with assert().

    These behaviours are not friendly. So I fixed the treating of
    errors.

    Signed-off-by: Hitoshi Mitake
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Frederic Weisbecker
    LKML-Reference:
    [ v2: improved a few small details ]
    Signed-off-by: Ingo Molnar

    Hitoshi Mitake
     

19 Nov, 2009

1 commit

  • 'perf bench mem memcpy' is a benchmark suite for measuring memcpy()
    performance.

    Example on a Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz:

    | % perf bench mem memcpy -l 1GB
    | # Running mem/memcpy benchmark...
    | # Copying 1MB Bytes from 0xb7d98008 to 0xb7e99008 ...
    |
    | 726.216412 MB/Sec

    Signed-off-by: Hitoshi Mitake
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Frederic Weisbecker
    LKML-Reference:
    [ v2: updated changelog, clarified history of builtin-bench.c ]
    Signed-off-by: Ingo Molnar

    Hitoshi Mitake
     

11 Nov, 2009

2 commits

  • This patch improves sched-message.c with more comfortable output.

    Change points are comment style description and
    formatting numerical values and its units.

    Example:

    | % perf bench sched messaging
    | # Running sched/messaging benchmark...
    | # 20 sender and receiver processes per group
    | # 10 groups == 400 processes run
    |
    | Total time: 1.490 [sec]

    Signed-off-by: Hitoshi Mitake
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Paul Mackerras

    Hitoshi Mitake
     
  • This patch improves sched-pipe.c with more comfortable output.

    Change points are comment style description and
    formatting numerical values and its units.

    Example:

    | % ./perf bench sched pipe
    | # Running sched/pipe benchmark...
    | # Extecuted 1000000 pipe operations between two tasks
    |
    | Total time:5.822 [sec]
    |
    | 5.822553 usecs/op
    | 171745 ops/sec

    Signed-off-by: Hitoshi Mitake
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hitoshi Mitake
     

10 Nov, 2009

4 commits

  • Clean up initializers in bench.h:

    - No need to break the line for function prototypes, they are more
    readable in a single line. (even if checkpatch complains about it

    - We try to align definitions / structure fields vertically,
    to make it all a bit more readable.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Hitoshi Mitake
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:

    Ingo Molnar
     
  • This patch modifies builtin-pipe.c for processing common
    options. The first option added is "--format".
    Users of perf bench will be able to specify output style by
    --format.

    Usage example:

    % ./perf bench sched pipe # with no style specify
    (executing 1000000 pipe operations between two tasks)

    Total time:5.855 sec
    5.855061 usecs/op
    170792 ops/sec

    % ./perf bench --format=simple sched pipe # specified simple
    5.988

    Signed-off-by: Hitoshi Mitake
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Paul Mackerras

    Hitoshi Mitake
     
  • This patch modifies bench/bench-messaging.c to adopt
    unified output formatting: --format option.

    Usage example:

    % ./perf bench sched messaging # with no style
    specify (20 sender and receiver processes per group)
    (10 groups == 400 processes run)

    Total time:1.431 sec

    % ./perf bench --format=simple sched messaging # specified
    simple 1.431

    Signed-off-by: Hitoshi Mitake
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hitoshi Mitake
     
  • This patch adds some constants and extern declaration to
    bench.h. These are used for unified output formatting
    of 'perf bench'.

    Signed-off-by: Hitoshi Mitake
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hitoshi Mitake
     

09 Nov, 2009

1 commit

  • Ingo reported this small 'perf bench sched pipe' output problem:

    | $ ./perf bench sched pipe
    | (executing 1000000 pipe operations between two tasks)
    |
    | Total time:4.898 sec
    | $ 4.898586 usecs/op
    | 204140 ops/sec
    |
    | the shell prompt came back before the usecs/op and ops/sec line
    | was printed. Process teardown race, lack of wait() or so?

    This caused by lack of calling waitpid() by parent process,
    so I added it.

    Signed-off-by: Hitoshi Mitake
    Cc: Rusty Russell
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Jiri Kosina
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hitoshi Mitake
     

08 Nov, 2009

3 commits

  • This patch adds bench/sched-pipe.c.

    bench/sched-pipe.c is a benchmark program
    to measure performance of pipe() system call.
    This benchmark is based on pipe-test-1m.c by Ingo Molnar:

    http://people.redhat.com/mingo/cfs-scheduler/tools/pipe-test-1m.c

    Example of use:

    % perf bench sched pipe
    (executing 1000000 pipe operations between two tasks)

    Total time:4.499 sec
    4.499179 usecs/op
    222262 ops/sec

    % perf bench sched pipe -s -l 1000
    0.015

    Signed-off-by: Hitoshi Mitake
    Cc: Rusty Russell
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: fweisbec@gmail.com
    Cc: Jiri Kosina
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hitoshi Mitake
     
  • This patch adds bench/sched-messaging.c.

    This benchmark measures performance of scheduler and IPC
    mechanisms, and is based on hackbench by Rusty Russell.

    Example of usage:

    % perf bench sched messaging -g 20 -l 1000 -s
    5.432 # in sec

    % perf bench sched messaging # run with default
    options (20 sender and receiver processes per group)
    (10 groups == 400 processes run)

    Total time:0.308 sec

    % perf bench sched messaging -t -g 20 # # be multi-thread,
    with 20 groups (20 sender and receiver threads per group)
    (20 groups == 800 threads run)

    Total time:0.582 sec

    ( Rusty is the original author of hackbench.c and he said the code is
    and was under the GPLv2 so fine to be merged. )

    Signed-off-by: Hitoshi Mitake
    Acked-by: Rusty Russell
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: fweisbec@gmail.com
    Cc: Jiri Kosina
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hitoshi Mitake
     
  • This patch adds bench/ directory and bench/bench.h.

    bench/ directory will contain modules for bench subcommand.
    bench/bench.h is for listing prototypes of module functions.

    Signed-off-by: Hitoshi Mitake
    Cc: Rusty Russell
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Arnaldo Carvalho de Melo
    Cc: fweisbec@gmail.com
    Cc: Jiri Kosina
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Hitoshi Mitake