24 Mar, 2012

1 commit

  • This renames for_each_set_bit_cont() to for_each_set_bit_from() because
    it is analogous to list_for_each_entry_from() in list.h rather than
    list_for_each_entry_continue().

    This doesn't remove for_each_set_bit_cont() for now.

    Signed-off-by: Akinobu Mita
    Cc: Robert Richter
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

21 Mar, 2012

1 commit

  • Pull perf events changes for v3.4 from Ingo Molnar:

    - New "hardware based branch profiling" feature both on the kernel and
    the tooling side, on CPUs that support it. (modern x86 Intel CPUs
    with the 'LBR' hardware feature currently.)

    This new feature is basically a sophisticated 'magnifying glass' for
    branch execution - something that is pretty difficult to extract from
    regular, function histogram centric profiles.

    The simplest mode is activated via 'perf record -b', and the result
    looks like this in perf report:

    $ perf record -b any_call,u -e cycles:u branchy

    $ perf report -b --sort=symbol
    52.34% [.] main [.] f1
    24.04% [.] f1 [.] f3
    23.60% [.] f1 [.] f2
    0.01% [k] _IO_new_file_xsputn [k] _IO_file_overflow
    0.01% [k] _IO_vfprintf_internal [k] _IO_new_file_xsputn
    0.01% [k] _IO_vfprintf_internal [k] strchrnul
    0.01% [k] __printf [k] _IO_vfprintf_internal
    0.01% [k] main [k] __printf

    This output shows from/to branch columns and shows the highest
    percentage (from,to) jump combinations - i.e. the most likely taken
    branches in the system. "branches" can also include function calls
    and any other synchronous and asynchronous transitions of the
    instruction pointer that are not 'next instruction' - such as system
    calls, traps, interrupts, etc.

    This feature comes with (hopefully intuitive) flat ascii and TUI
    support in perf report.

    - Various 'perf annotate' visual improvements for us assembly junkies.
    It will now recognize function calls in the TUI and by hitting enter
    you can follow the call (recursively) and back, amongst other
    improvements.

    - Multiple threads/processes recording support in perf record, perf
    stat, perf top - which is activated via a comma-list of PIDs:

    perf top -p 21483,21485
    perf stat -p 21483,21485 -ddd
    perf record -p 21483,21485

    - Support for per UID views, via the --uid paramter to perf top, perf
    report, etc. For example 'perf top --uid mingo' will only show the
    tasks that I am running, excluding other users, root, etc.

    - Jump label restructurings and improvements - this includes the
    factoring out of the (hopefully much clearer) include/linux/static_key.h
    generic facility:

    struct static_key key = STATIC_KEY_INIT_FALSE;

    ...

    if (static_key_false(&key))
    do unlikely code
    else
    do likely code

    ...
    static_key_slow_inc();
    ...
    static_key_slow_inc();
    ...

    The static_key_false() branch will be generated into the code with as
    little impact to the likely code path as possible. the
    static_key_slow_*() APIs flip the branch via live kernel code patching.

    This facility can now be used more widely within the kernel to
    micro-optimize hot branches whose likelihood matches the static-key
    usage and fast/slow cost patterns.

    - SW function tracer improvements: perf support and filtering support.

    - Various hardenings of the perf.data ABI, to make older perf.data's
    smoother on newer tool versions, to make new features integrate more
    smoothly, to support cross-endian recording/analyzing workflows
    better, etc.

    - Restructuring of the kprobes code, the splitting out of 'optprobes',
    and a corner case bugfix.

    - Allow the tracing of kernel console output (printk).

    - Improvements/fixes to user-space RDPMC support, allowing user-space
    self-profiling code to extract PMU counts without performing any
    system calls, while playing nice with the kernel side.

    - 'perf bench' improvements

    - ... and lots of internal restructurings, cleanups and fixes that made
    these features possible. And, as usual this list is incomplete as
    there were also lots of other improvements

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (120 commits)
    perf report: Fix annotate double quit issue in branch view mode
    perf report: Remove duplicate annotate choice in branch view mode
    perf/x86: Prettify pmu config literals
    perf report: Enable TUI in branch view mode
    perf report: Auto-detect branch stack sampling mode
    perf record: Add HEADER_BRANCH_STACK tag
    perf record: Provide default branch stack sampling mode option
    perf tools: Make perf able to read files from older ABIs
    perf tools: Fix ABI compatibility bug in print_event_desc()
    perf tools: Enable reading of perf.data files from different ABI rev
    perf: Add ABI reference sizes
    perf report: Add support for taken branch sampling
    perf record: Add support for sampling taken branch
    perf tools: Add code to support PERF_SAMPLE_BRANCH_STACK
    x86/kprobes: Split out optprobe related code to kprobes-opt.c
    x86/kprobes: Fix a bug which can modify kernel code permanently
    x86/kprobes: Fix instruction recovery on optimized path
    perf: Add callback to flush branch_stack on context switch
    perf: Disable PERF_SAMPLE_BRANCH_* when not supported
    perf/x86: Add LBR software filter support for Intel CPUs
    ...

    Linus Torvalds
     

14 Mar, 2012

4 commits

  • On ancient systems I get this build failure:

    util/../../../arch/x86/include/asm/unistd.h:67:29: error: asm/unistd_64.h: No such file or directory
    In file included from util/cache.h:7,
    from builtin-test.c:8:
    util/../perf.h: In function ‘sys_perf_event_open’:In file included from util/../perf.h:16
    perf.h:170: error: ‘__NR_perf_event_open’ undeclared (first use in this function)

    The reason is that this old system does not have the split
    unistd.h headers yet, from which to pick up the syscall
    definitions.

    Add the syscall numbers to the already existing i386 and x86_64
    blocks in perf.h, and also provide empty include file stubs.

    With this patch perf builds and works fine on 5 years old
    user-space as well.

    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Link: http://lkml.kernel.org/n/tip-jctwg64le1w47tuaoeyftsg9@git.kernel.org
    Signed-off-by: Ingo Molnar
    Signed-off-by: Arnaldo Carvalho de Melo

    Ingo Molnar
     
  • Several places were expecting that the value returned was the number of
    characters printed, not what would be printed if there was space.

    Fix it by using the scnprintf and vscnprintf variants we inherited from
    the kernel sources.

    Some corner cases where the number of printed characters were not
    accounted were fixed too.

    Reported-by: Anton Blanchard
    Cc: Anton Blanchard
    Cc: Eric B Munson
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Yanmin Zhang
    Cc: stable@kernel.org
    Link: http://lkml.kernel.org/n/tip-kwxo2eh29cxmd8ilixi2005x@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • I have a workload where perf top scribbles over the stack and we SEGV.
    What makes it interesting is that an snprintf is causing this.

    The workload is a c++ gem that has method names over 3000 characters
    long, but snprintf is designed to avoid overrunning buffers. So what
    went wrong?

    The problem is we assume snprintf returns the number of characters
    written:

    ret += repsep_snprintf(bf + ret, size - ret, "[%c] ", self->level);
    ...
    ret += repsep_snprintf(bf + ret, size - ret, "%s", self->ms.sym->name);

    Unfortunately this is not how snprintf works. snprintf returns the
    number of characters that would have been written if there was enough
    space. In the above case, if the first snprintf returns a value larger
    than size, we pass a negative size into the second snprintf and happily
    scribble over the stack. If you have 3000 character c++ methods thats a
    lot of stack to trample.

    This patch fixes repsep_snprintf by clamping the value at size - 1 which
    is the maximum snprintf can write before adding the NULL terminator.

    I get the sinking feeling that there are a lot of other uses of snprintf
    that have this same bug, we should audit them all.

    Cc: David Ahern
    Cc: Eric B Munson
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Yanmin Zhang
    Cc: stable@kernel.org
    Link: http://lkml.kernel.org/r/20120307114249.44275ca3@kryten
    Signed-off-by: Anton Blanchard
    Signed-off-by: Arnaldo Carvalho de Melo

    Anton Blanchard
     
  • This patch fixes a buffer overrun bug in
    tracepoint_id_to_path(). The bug manisfested itself as a memory
    error reported by perf record. I ran into it with perf sched:

    $ perf sched rec noploop 2 noploop for 2 seconds
    [ perf record: Woken up 14 times to write data ]
    [ perf record: Captured and wrote 42.701 MB perf.data (~1865622 samples) ]
    Fatal: No memory to alloc tracepoints list

    It turned out that tracepoint_id_to_path() was reading the
    tracepoint id using read() but the buffer was not large enough
    to include the \n terminator for id with 4 digits or more.

    The patch fixes the problem by extending the buffer to a more
    reasonable size covering all possible id length include \n
    terminator. Note that atoll() stops at the first non digit
    character, thus it is not necessary to clear the buffer between
    each read.

    Signed-off-by: Stephane Eranian
    Acked-by: Arnaldo Carvalho de Melo
    Acked-by: Peter Zijlstra
    Cc: fweisbec@gmail.com
    Cc: dsahern@gmail.com
    Link: http://lkml.kernel.org/r/20120313155102.GA6465@quad
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     

13 Mar, 2012

3 commits

  • Merge reason: The 'perf record -b' hardware branch sampling feature is ready for upstream.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • This patch fixes perf report to not go back two levels when
    pressing the 'q' key while annotating in branch view mode.

    When pressing 'q' in annotate mode and if the branch source
    and target belong to different functions, perf now brings
    up the annotation popup menu again to offer the option to
    annotate the other branch source or target.

    As part of the code restructuring in perf_evsel__hists_browse()
    we also fix a memory leak on options[] in case of error.

    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: acme@redhat.com
    Cc: asharma@fb.com
    Cc: ravitillo@lbl.gov
    Cc: vweaver1@eecs.utk.edu
    Cc: khandual@linux.vnet.ibm.com
    Cc: dsahern@gmail.com
    Link: http://lkml.kernel.org/r/1331565210-10865-3-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • This patch removes the duplicated annotate selection when
    browsing in branch view mode. If the sym and dso oof the branch
    source and target are the same, then only one annotate choice is
    proposed.

    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: acme@redhat.com
    Cc: asharma@fb.com
    Cc: ravitillo@lbl.gov
    Cc: vweaver1@eecs.utk.edu
    Cc: khandual@linux.vnet.ibm.com
    Cc: dsahern@gmail.com
    Link: http://lkml.kernel.org/r/1331565210-10865-2-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     

09 Mar, 2012

10 commits

  • This patch updates perf report to support TUI mode
    when the perf.data file contains samples with branch
    stacks.

    For each row in the report, it is possible to annotate
    either the source or target of each branch.

    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: acme@redhat.com
    Cc: asharma@fb.com
    Cc: ravitillo@lbl.gov
    Cc: vweaver1@eecs.utk.edu
    Cc: khandual@linux.vnet.ibm.com
    Cc: dsahern@gmail.com
    Link: http://lkml.kernel.org/r/1331246868-19905-5-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • This patch enhances perf report to auto-detect when the
    perf.data file contains samples with branch stacks. That way it
    is not necessary to use the -b option.

    To force branch view mode to off, simply use --no-branch-stack.

    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: acme@redhat.com
    Cc: asharma@fb.com
    Cc: ravitillo@lbl.gov
    Cc: vweaver1@eecs.utk.edu
    Cc: khandual@linux.vnet.ibm.com
    Cc: dsahern@gmail.com
    Link: http://lkml.kernel.org/r/1331246868-19905-4-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • This patch adds a new feature bit, namely,
    HEADER_BRANCH_STACK. When present, it indicates
    that sample records may contain branch stack.

    This could be useful to a viewer to switch to
    branch mode without having to parse all the
    samples or without a specific cmdline option.

    This will be used in a subsequent patch to
    enhance perf report with branch stacks.

    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: acme@redhat.com
    Cc: asharma@fb.com
    Cc: ravitillo@lbl.gov
    Cc: vweaver1@eecs.utk.edu
    Cc: khandual@linux.vnet.ibm.com
    Cc: dsahern@gmail.com
    Link: http://lkml.kernel.org/r/1331246868-19905-3-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • This patch chanegs the logic of the -b, --branch-stack options
    of perf record.

    Based on users' request, the patch provides a default filter
    mode with the -b (or --branch-any) option. With the option,
    any type of taken branches is sampled.

    With -j (or --branch-filter), the user can specify any
    valid combination of branch types and privilege levels
    if supported by the underlying hardware.

    The -b (--branch any) is a shortcut for: --branch-filter any.

    $ perf record -b foo

    or:

    $ perf record --branch-filter any foo

    For more specific filtering:

    $ perf record --branch-filter ind_call,u foo

    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: acme@redhat.com
    Cc: asharma@fb.com
    Cc: ravitillo@lbl.gov
    Cc: vweaver1@eecs.utk.edu
    Cc: khandual@linux.vnet.ibm.com
    Cc: dsahern@gmail.com
    Link: http://lkml.kernel.org/r/1331246868-19905-2-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • This patches provides a way to handle legacy perf.data
    files. Legacy files are those using the older PERFFILE
    signature.

    For those, it is still necessary to detect endianness but
    without comparing their header->attr_size with the
    tool's own version as it may be different. Instead, we use
    a reference table for all known sizes from the legacy era.

    We try all the combinations for sizes and endianness. If we find
    a match, we proceed, otherwise we return: "incompatible file
    format".

    This is also done for the pipe-mode file format.

    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: acme@redhat.com
    Cc: robert.richter@amd.com
    Cc: ming.m.lin@intel.com
    Cc: andi@firstfloor.org
    Cc: asharma@fb.com
    Cc: ravitillo@lbl.gov
    Cc: vweaver1@eecs.utk.edu
    Cc: khandual@linux.vnet.ibm.com
    Cc: dsahern@gmail.com
    Link: http://lkml.kernel.org/r/1328826068-11713-19-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • This patches cleans up local variable types for msz and ret.
    They need to be size_t and ssize_t respectively.

    It also fixes a bug whereby perf would not read attr struct
    with a different size than what it knows about.

    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: acme@redhat.com
    Cc: robert.richter@amd.com
    Cc: ming.m.lin@intel.com
    Cc: andi@firstfloor.org
    Cc: asharma@fb.com
    Cc: ravitillo@lbl.gov
    Cc: vweaver1@eecs.utk.edu
    Cc: khandual@linux.vnet.ibm.com
    Cc: dsahern@gmail.com
    Link: http://lkml.kernel.org/r/1328826068-11713-18-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • This patch allows perf to process perf.data files generated
    using an ABI that has a different perf_event_attr struct size,
    i.e., a different ABI version.

    The perf_event_attr can be extended, yet perf needs to cope with
    older perf.data files. Similarly, perf must be able to cope with
    a perf.data file which is using a newer version of the ABI than
    what it knows about.

    This patch adds read_attr(), a routine that reads a
    perf_event_attr struct from a file incrementally based on its
    advertised size. If the on-file struct is smaller than what perf
    knows, then the extra fields are zeroed. If the on-file struct
    is bigger, then perf only uses what it knows about, the rest is
    skipped.

    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: acme@redhat.com
    Cc: robert.richter@amd.com
    Cc: ming.m.lin@intel.com
    Cc: andi@firstfloor.org
    Cc: asharma@fb.com
    Cc: ravitillo@lbl.gov
    Cc: vweaver1@eecs.utk.edu
    Cc: khandual@linux.vnet.ibm.com
    Cc: dsahern@gmail.com
    Link: http://lkml.kernel.org/r/1328826068-11713-17-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • This patch adds support for taken branch sampling, i.e, the
    PERF_SAMPLE_BRANCH_STACK feature to perf report. In other
    words, to display histograms based on taken branches rather
    than executed instructions addresses.

    The new option is called -b and it takes no argument. To
    generate meaningful output, the perf.data must have been
    obtained using perf record -b xxx ... where xxx is a branch
    filter option.

    The output shows symbols, modules, sorted by 'who branches
    where' the most often. The percentages reported in the first
    column refer to the total number of branches captured and
    not the usual number of samples.

    Here is a quick example.
    Here branchy is simple test program which looks as follows:

    void f2(void)
    {}
    void f3(void)
    {}
    void f1(unsigned long n)
    {
    if (n & 1UL)
    f2();
    else
    f3();
    }
    int main(void)
    {
    unsigned long i;

    for (i=0; i < N; i++)
    f1(i);
    return 0;
    }

    Here is the output captured on Nehalem, if we are
    only interested in user level function calls.

    $ perf record -b any_call,u -e cycles:u branchy

    $ perf report -b --sort=symbol
    52.34% [.] main [.] f1
    24.04% [.] f1 [.] f3
    23.60% [.] f1 [.] f2
    0.01% [k] _IO_new_file_xsputn [k] _IO_file_overflow
    0.01% [k] _IO_vfprintf_internal [k] _IO_new_file_xsputn
    0.01% [k] _IO_vfprintf_internal [k] strchrnul
    0.01% [k] __printf [k] _IO_vfprintf_internal
    0.01% [k] main [k] __printf

    About half (52%) of the call branches captured are from main()
    -> f1(). The second half (24%+23%) is split in two equal shares
    between f1() -> f2(), f1() ->f3(). The output is as expected
    given the code.

    It should be noted, that using -b in perf record does not
    eliminate information in the perf.data file. Consequently, a
    typical profile can also be obtained by perf report by simply
    not using its -b option.

    It is possible to sort on branch related columns:

    - dso_from, symbol_from
    - dso_to, symbol_to
    - mispredict

    Signed-off-by: Roberto Agostino Vitillo
    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: acme@redhat.com
    Cc: robert.richter@amd.com
    Cc: ming.m.lin@intel.com
    Cc: andi@firstfloor.org
    Cc: asharma@fb.com
    Cc: vweaver1@eecs.utk.edu
    Cc: khandual@linux.vnet.ibm.com
    Cc: dsahern@gmail.com
    Link: http://lkml.kernel.org/r/1328826068-11713-14-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Roberto Agostino Vitillo
     
  • This patch adds a new option to enable taken branch stack
    sampling, i.e., leverage the PERF_SAMPLE_BRANCH_STACK feature
    of perf_events.

    There is a new option to active this mode: -b.
    It is possible to pass a set of filters to select the type of
    branches to sample.

    The following filters are available:

    - any : any type of branches
    - any_call : any function call or system call
    - any_ret : any function return or system call return
    - any_ind : any indirect branch
    - u: only when the branch target is at the user level
    - k: only when the branch target is in the kernel
    - hv: only when the branch target is in the hypervisor

    Filters can be combined by passing a comma separated list
    to the option:

    $ perf record -b any_call,u -e cycles:u branchy

    Signed-off-by: Roberto Agostino Vitillo
    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: acme@redhat.com
    Cc: robert.richter@amd.com
    Cc: ming.m.lin@intel.com
    Cc: andi@firstfloor.org
    Cc: asharma@fb.com
    Cc: vweaver1@eecs.utk.edu
    Cc: khandual@linux.vnet.ibm.com
    Cc: dsahern@gmail.com
    Link: http://lkml.kernel.org/r/1328826068-11713-13-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Roberto Agostino Vitillo
     
  • This patch adds:

    - ability to parse samples with PERF_SAMPLE_BRANCH_STACK
    - sort on branches (dso_from, symbol_from, dso_to, symbol_to, mispredict)
    - build histograms on branches

    Signed-off-by: Roberto Agostino Vitillo
    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: acme@redhat.com
    Cc: robert.richter@amd.com
    Cc: ming.m.lin@intel.com
    Cc: andi@firstfloor.org
    Cc: asharma@fb.com
    Cc: vweaver1@eecs.utk.edu
    Cc: khandual@linux.vnet.ibm.com
    Cc: dsahern@gmail.com
    Link: http://lkml.kernel.org/r/1328826068-11713-12-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar

    Roberto Agostino Vitillo
     

05 Mar, 2012

8 commits

  • If perf.data couldn't find vmlinux image for the given build-id,
    it would print error message. However it lacked a newline at the
    end, so the output looked like below:

    $ perf annotate --stdio
    No vmlinux file with build id 63b554b2e90f14a4bced200008865e757d3e8b36
    was found in the path.

    Please use:

    perf buildid-cache -av vmlinux

    or:

    --vmlinux vmlinux Percent | Source code & Disassembly of a.out
    ------------------------------------------------
    :
    :
    :
    : Disassembly of section .text:
    :
    : 00000000004004f4 :
    0.00 : 4004f4: push %rbp
    0.00 : 4004f5: mov %rsp,%rbp
    0.00 : 4004f8: movl $0x0,-0x4(%rbp)
    0.00 : 4004ff: jmp 400517
    14.70 : 400501: mov 0x200b28(%rip),%rax # 601030
    0.02 : 400508: add $0x1,%rax
    0.01 : 40050c: mov %rax,0x200b1d(%rip) # 601030
    0.01 : 400513: addl $0x1,-0x4(%rbp)
    13.92 : 400517: cmpl $0x98967f,-0x4(%rbp)
    71.33 : 40051e: jle 400501
    0.00 : 400520: leaveq
    0.00 : 400521: retq

    Fix it by adding a newline at the end of the message. It doesn't affect
    the tui output AFAICS. New output will look like this:

    ...
    or:

    --vmlinux vmlinux
    Percent | Source code & Disassembly of a.out
    ------------------------------------------------
    :
    :
    :
    : Disassembly of section .text:
    ...

    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1329986784-4916-6-git-send-email-namhyung.kim@lge.com
    Signed-off-by: Namhyung Kim
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • Separate multiple binding using /, capitalize descriptions, add missing
    key binding.

    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1329986784-4916-5-git-send-email-namhyung.kim@lge.com
    Signed-off-by: Namhyung Kim
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • On tui annotation, the title was set to name of the target symbol if
    user selects the target. However it remained after returning to original
    symbol from the target. Fix it.

    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1329986784-4916-4-git-send-email-namhyung.kim@lge.com
    Signed-off-by: Namhyung Kim
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • Accepting upper case character only is unconvenient since it requires
    SHIFT key too. Why not change to it accept a simple key stroke?

    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1329986784-4916-3-git-send-email-namhyung.kim@lge.com
    Signed-off-by: Namhyung Kim
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • Print unselected asm code lines as blue. This is what we do now for
    --stdio.

    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1329986784-4916-2-git-send-email-namhyung.kim@lge.com
    Signed-off-by: Namhyung Kim
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • There are some variable arguments can be specified on make invocation,
    but some of them are missing descriptions so that user cannot be
    informed easily. Fix it.

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1329980894-4289-1-git-send-email-namhyung.kim@lge.com
    Signed-off-by: Namhyung Kim
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • If perf_evsel__open() failed, the errno was set and returned properly.

    However since the perf_evlist__open() called close() on fd's for all of
    evsel x cpu x thread after the failure, the errno was overridden by
    other code (EBADF). So the caller of the function ended up seeing
    different error message and getting confused.

    Fit it by restoring original return value. Because one of caller of the
    function is in the python extension, and it uses system errno
    internally, it'd be better restoring the original value rather than
    using the return value of the function directly, IMHO (i.e. I'm not a
    python expert :)

    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1329966816-23175-1-git-send-email-namhyung.kim@lge.com
    Signed-off-by: Namhyung Kim
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • Conflicts:
    tools/perf/builtin-record.c
    tools/perf/builtin-top.c
    tools/perf/perf.h
    tools/perf/util/top.h

    Merge reason: resolve these cherry-picking conflicts.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

03 Mar, 2012

3 commits

  • Just fall back to resetting those fields, if set, warning the user that
    that feature is not available.

    If guest samples appear they will just be discarded because no struct
    machine will be found and thus the event will be accounted as not
    handled and dropped, see 0c09571.

    Reported-by: Namhyung Kim
    Tested-by: Joerg Roedel
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Joerg Roedel
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-vuwxig36mzprl5n7nzvnxxsh@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Setting perf_guest to true by default makes no sense because the perf
    subcommands can not setup guest symbol information and thus not process
    and guest samples. The only exception is perf-kvm which changes the
    perf_guest value on its own. So change the default for perf_guest back
    to false.

    Cc: David Ahern
    Cc: Ingo Molnar
    Cc: Jason Wang
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1328893505-4115-3-git-send-email-joerg.roedel@amd.com
    Signed-off-by: Joerg Roedel
    Signed-off-by: Arnaldo Carvalho de Melo

    Joerg Roedel
     
  • A recent refactoring of perf-record introduced the following:

    perf record -a -B
    Couldn't generating buildids. Use --no-buildid to profile anyway.
    sleep: Terminated

    I believe the triple negative was meant to be only a double negative.
    :-) While I'm there, fixed the grammar on the error message.

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Robert Richter
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1328567272-13190-1-git-send-email-dsahern@gmail.com
    Signed-off-by: David Ahern
    Signed-off-by: Arnaldo Carvalho de Melo

    David Ahern
     

01 Mar, 2012

4 commits

  • The 'perf probe' command allows kprobe to be inserted at any offset from
    a function start, which results in adding kprobes to unintended
    location. (example: perf probe do_fork+10000 is allowed even though
    size of do_fork is ~904).

    My previous patch https://lkml.org/lkml/2012/2/24/42 addressed the case
    where DWARF info was available for the kernel. This patch fixes the
    case where perf probe is used on a kernel without debuginfo available.

    Acked-by: Masami Hiramatsu
    Cc: Ananth N Mavinakayanahalli
    Cc: Jason Baron
    Cc: Masami Hiramatsu
    Cc: Srikar Dronamraju
    Cc: Steven Rostedt
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/4F4C544D.1010909@linux.vnet.ibm.com
    Signed-off-by: Prashanth Nageshappa
    Signed-off-by: Arnaldo Carvalho de Melo

    Prashanth Nageshappa
     
  • If threads in a multi-threaded process have names shorter than the main
    thread the comm for the named threads is not properly terminated.

    E.g., for the process 'namedthreads' where each thread is named noploop%d
    where %d is the thread number:

    Before:
    perf script -f comm,tid,ip,sym,dso
    noploop:4ads 21616 400a49 noploop (/tmp/namedthreads)
    The 'ads' in the thread comm bleeds over from the process name.

    After:
    perf script -f comm,tid,ip,sym,dso
    noploop:4 21616 400a49 noploop (/tmp/namedthreads)

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1330111898-68071-1-git-send-email-dsahern@gmail.com
    Signed-off-by: David Ahern
    Signed-off-by: Arnaldo Carvalho de Melo

    David Ahern
     
  • The perf probe command allows kprobe to be inserted at any offset from a
    function start, which results in adding kprobes to unintended location.

    Example: perf probe do_fork+10000 is allowed even though size of do_fork
    is ~904.

    This patch will ensure probe addition fails when the offset specified is
    greater than size of the function.

    Acked-by: Masami Hiramatsu
    Cc: Ananth N Mavinakayanahalli
    Cc: Srikar Dronamraju
    Cc: Steven Rostedt
    Cc: Andrew Morton
    Cc: Jason Baron
    Cc: Masami Hiramatsu
    Link: http://lkml.kernel.org/r/4F473F33.4060409@linux.vnet.ibm.com
    Signed-off-by: Prashanth Nageshappa
    Signed-off-by: Arnaldo Carvalho de Melo

    Prashanth Nageshappa
     
  • On old kernels that don't support sample_id_all feature,
    perf_evlist__id2evsel() returns NULL for non-sampling events.

    This breaks perf top when multiple events are given on command line. Fix
    it by using first evsel in the evlist. This will also prevent getting
    the same (potential) problem in such new tool/ old kernel combo.

    Suggested-by: Arnaldo Carvalho de Melo
    Cc: Ingo Molnar
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1329702447-25045-1-git-send-email-namhyung.kim@lge.com
    Signed-off-by: Namhyung Kim
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

22 Feb, 2012

1 commit

  • The following commit:
    b52956c perf tools: Allow multiple threads or processes in record, stat, top

    introduced a bug in the thread_map code which caused perf record -a to
    not setup system-wide monitoring properly.

    $ taskset -c 1 noploop 1000 &
    $ perf record -a -C 1 sleep 10
    $ perf report -D | tail -20
    cycles stats:
    TOTAL events: 4413
    MMAP events: 4025
    COMM events: 340
    SAMPLE events: 48

    Here I was expecting about 10,000 samples and not 48.

    In system-wide mode, the PID passed to perf_event_open() must be -1 and
    it was 0. That caused the kernel to setup a per-process event on PID:0.
    Consequently, the number of samples captured does not correspond to the
    requested measurement.

    The following one-liner fixes the problem for me with or without -C.

    I would also suggest to change the malloc() to something that matches
    the struct definition. thread_map->map[] is declared as int map[] and
    not pid_t map[]. If map[] can only contain pids, then change the struct
    definition.

    Acked-by: David Ahern
    Cc: David Ahern
    Cc: Eric Dumazet
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120221145424.GA6757@quad
    Signed-off-by: Stephane Eranian
    Signed-off-by: Arnaldo Carvalho de Melo

    Stephane Eranian
     

18 Feb, 2012

2 commits

  • tools/perf/util/probe-event.c included 'string.h' twice, remove the
    duplicate.

    Acked-by: Masami Hiramatsu
    Cc: Danny Kukawka
    Cc: Ingo Molnar
    Cc: Jovi Zhang
    Cc: Masami Hiramatsu
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/1329400459-31570-1-git-send-email-danny.kukawka@bisect.de
    Signed-off-by: Danny Kukawka
    Signed-off-by: Arnaldo Carvalho de Melo

    Danny Kukawka
     
  • The __print_symbolic() function takes a sequence of key-value pairs for
    pretty-printing a constant. The new kvm:kvm_exit print fmt uses the
    expression:

    __print_symbolic(..., { 0x040 + 1, "DB excp" }, ...)

    Currently only atoms are supported and this print fmt fails to parse.
    This patch adds support for expressions instead of just atoms so that
    0x040 + 1 is parsed successfully. Also add arg_num_eval() support for
    the '+' operator.

    Acked-by: Steven Rostedt
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/1315148939-14313-1-git-send-email-stefanha@linux.vnet.ibm.com
    Signed-off-by: Stefan Hajnoczi
    Signed-off-by: Arnaldo Carvalho de Melo

    Stefan Hajnoczi
     

15 Feb, 2012

2 commits

  • Instead of requiring that users of perf_record_opts set
    .sample_id_all_avail to true, just invert the logic, using
    .sample_id_all_missing, that doesn't need to be explicitely initialized
    since gcc will zero members ommitted in a struct initialization.

    Just like the newly introduced .exclude_{guest,host} feature test.

    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-ab772uzk78cwybihf0vt7kxw@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Just fall back to resetting those fields, if set, warning the user that
    that feature is not available.

    If guest samples appear they will just be discarded because no struct
    machine will be found and thus the event will be accounted as not
    handled and dropped, see 0c09571.

    Reported-by: Namhyung Kim
    Tested-by: Joerg Roedel
    Cc: David Ahern
    Cc: Frederic Weisbecker
    Cc: Joerg Roedel
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Link: http://lkml.kernel.org/n/tip-vuwxig36mzprl5n7nzvnxxsh@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

14 Feb, 2012

1 commit

  • The perf_event_attr size needs to be initialized in all cases because it
    captures the ABI version.

    This patch moves the initialization of the field from the
    perf_event_open() syscall stub to its proper location in the
    event_attr_init().

    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120209151238.GA10272@quad
    Signed-off-by: Stephane Eranian
    Signed-off-by: Arnaldo Carvalho de Melo

    Stephane Eranian