06 Apr, 2019

7 commits

  • [ Upstream commit cc437642255224e4140fed1f3e3156fc8ad91903 ]

    In Python3, the result of PyModule_Create (called from
    scripts/python/Perf-Trace-Util/Context.c) is not automatically added to
    sys.modules. See: https://bugs.python.org/issue4592

    Below is the observed behavior without the fix:

    # ldd /usr/bin/perf | grep -i python
    libpython3.6m.so.1.0 => /usr/lib64/libpython3.6m.so.1.0 (0x00007f8e1dfb2000)

    # perf record /bin/false
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.015 MB perf.data (17 samples) ]

    # perf script -g python | cat
    generated Python script: perf-script.py

    # perf script -s ./perf-script.py
    Traceback (most recent call last):
    File "./perf-script.py", line 18, in
    from perf_trace_context import *
    ModuleNotFoundError: No module named 'perf_trace_context'
    Error running python script ./perf-script.py
    #

    Committer notes:

    To build with python3 use:

    $ make -C tools/perf PYTHON=python3

    Use a non-const variable to pass the 'name' arg to
    PyImport_AppendInittab(), as python2.6 has that as 'char *', which ends
    up trowing this in some environments:

    CC /tmp/build/perf/util/parse-branch-options.o
    util/scripting-engines/trace-event-python.c: In function 'python_start_script':
    util/scripting-engines/trace-event-python.c:1520:2: error: passing argument 1 of 'PyImport_AppendInittab' discards 'const' qualifier from pointer target type [-Werror]
    PyImport_AppendInittab("perf_trace_context", initfunc);
    ^
    In file included from /usr/include/python2.6/Python.h:130:0,
    from util/scripting-engines/trace-event-python.c:22:
    /usr/include/python2.6/import.h:54:17: note: expected 'char *' but argument is of type 'const char *'
    PyAPI_FUNC(int) PyImport_AppendInittab(char *name, void (*initfunc)(void));
    ^
    cc1: all warnings being treated as errors

    Signed-off-by: Tony Jones
    Acked-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Jaroslav Škarvada
    Cc: Jonathan Corbet
    Cc: Ravi Bangoria
    Cc: Seeteena Thoufeek
    Fixes: 66dfdff03d19 ("perf tools: Add Python 3 support")
    Link: http://lkml.kernel.org/r/20190124005229.16146-2-tonyj@suse.de
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Tony Jones
     
  • [ Upstream commit 72e0b15cb24a497d7d0d4707cf51ff40c185ae8c ]

    With Python3. PyUnicode_FromStringAndSize is unsafe to call on attr and will
    return NULL. Use _PyBytes_FromStringAndSize (as with raw_buf).

    Below is the observed behavior without the fix. Note it is first necessary
    to apply the prior fix (Add trace_context extension module to sys,modules):

    # ldd /usr/bin/perf | grep -i python
    libpython3.6m.so.1.0 => /usr/lib64/libpython3.6m.so.1.0 (0x00007f8e1dfb2000)

    # perf record -e raw_syscalls:sys_enter /bin/false
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.018 MB perf.data (21 samples) ]

    # perf script -g python | cat
    generated Python script: perf-script.py

    # perf script -s ./perf-script.py
    in trace_begin
    Segmentation fault (core dumped)

    Signed-off-by: Tony Jones
    Acked-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Jaroslav Škarvada
    Cc: Jonathan Corbet
    Cc: Ravi Bangoria
    Cc: Seeteena Thoufeek
    Fixes: 66dfdff03d19 ("perf tools: Add Python 3 support")
    Link: http://lkml.kernel.org/r/20190124005229.16146-3-tonyj@suse.de
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Tony Jones
     
  • [ Upstream commit 2187d87eacd46f6214ce3dc9cfd7a558375a4153 ]

    On IBM z13 machine types 2964 and 2965 the descriptor
    sizes for sampling and diagnostic sampling entries
    might be missing in the trailer entry and are set to zero.

    This leads to a perf report failure when processing diagnostic
    sampling entries.

    This patch adds missing descriptor sizes when the trailer entry
    contains zero for these fields.

    Output before:
    [root@s38lp82 perf]# ./perf report --stdio | fgrep Samples
    0xabbf0 [0x8]: failed to process type: 68
    Error:
    failed to process sample
    [root@s38lp82 perf]#

    Output after:
    [root@s38lp82 perf]# ./perf report --stdio | fgrep Samples
    # Total Lost Samples: 0
    # Samples: 3K of event 'SF_CYCLES_BASIC_DIAG'
    # Samples: 162 of event 'CF_DIAG'
    [root@s38lp82 perf]#

    Fixes: 2b1444f2e28b ("perf report: Add raw report support for s390 auxiliary trace")

    Signed-off-by: Thomas Richter
    Reviewed-by: Hendrik Brueckner
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Link: http://lkml.kernel.org/r/20190211100627.85714-1-tmricht@linux.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Thomas Richter
     
  • [ Upstream commit 7346195e8643482968f547483e0d823ec1982fab ]

    We can't assume inlined symbols with the same name are equal, because
    their address range may be different. This will cause the symbols with
    different addresses be shadowed when adding to the hist entry, and lead
    to ERANGE error when checking the symbol address during sample parse,
    the addr should be within the range of [sym.start, sym.end].

    The error message is like: "0x36aea60 [0x8]: failed to process type: 68".

    The second parameter of symbol__new() is the length of the fake symbol
    for the inline frame, which is the subtraction of the end and start
    address of base_sym.

    Signed-off-by: He Kuang
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Milian Wolff
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Fixes: aa441895f7b4 ("perf report: Compare symbol name for inlined frames when sorting")
    Link: http://lkml.kernel.org/r/20190219130531.15692-1-hekuang@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    He Kuang
     
  • [ Upstream commit 03d309711d687460d1345de8a0363f45b1c8cd11 ]

    Commit 489338a717a0 ("perf tests evsel-tp-sched: Fix bitwise operator")
    causes test case 14 "Parse sched tracepoints fields" to fail on s390.

    This test succeeds on x86.

    In fact this test now fails on all architectures with type char treated
    as type unsigned char.

    The root cause is the signed-ness of character arrays in the tracepoints
    sched_switch for structure members prev_comm and next_comm.

    On s390 the output of:

    [root@m35lp76 perf]# cat /sys/kernel/debug/tracing/events/sched/sched_switch/format
    name: sched_switch
    ID: 287
    format:
    field:unsigned short common_type; offset:0; size:2; signed:0;
    ...
    field:char prev_comm[16]; offset:8; size:16; signed:0;
    ...
    field:char next_comm[16]; offset:40; size:16; signed:0;

    reveals the character arrays prev_comm and next_comm are per
    default unsigned char and have values in the range of 0..255.

    On x86 both fields are signed as this output shows:
    [root@f29]# cat /sys/kernel/debug/tracing/events/sched/sched_switch/format
    name: sched_switch
    ID: 287
    format:
    field:unsigned short common_type; offset:0; size:2; signed:0;
    ...
    field:char prev_comm[16]; offset:8; size:16; signed:1;
    ...
    field:char next_comm[16]; offset:40; size:16; signed:1;

    and the character arrays prev_comm and next_comm are per default signed
    char and have values in the range of -1..127. The implementation of
    type char is architecture specific.

    Since the character arrays in both tracepoints sched_switch and
    sched_wakeup should contain ascii characters, simply omit the check for
    signedness in the test case.

    Output before:

    [root@m35lp76 perf]# ./perf test -F 14
    14: Parse sched tracepoints fields :
    --- start ---
    sched:sched_switch: "prev_comm" signedness(0) is wrong, should be 1
    sched:sched_switch: "next_comm" signedness(0) is wrong, should be 1
    sched:sched_wakeup: "comm" signedness(0) is wrong, should be 1
    ---- end ----
    14: Parse sched tracepoints fields : FAILED!
    [root@m35lp76 perf]#

    Output after:

    [root@m35lp76 perf]# ./perf test -Fv 14
    14: Parse sched tracepoints fields :
    --- start ---
    ---- end ----
    Parse sched tracepoints fields: Ok
    [root@m35lp76 perf]#

    Fixes: 489338a717a0 ("perf tests evsel-tp-sched: Fix bitwise operator")

    Signed-off-by: Thomas Richter
    Cc: Heiko Carstens
    Cc: Hendrik Brueckner
    Cc: Martin Schwidefsky
    Link: http://lkml.kernel.org/r/20190219153639.31267-1-tmricht@linux.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Thomas Richter
     
  • [ Upstream commit 11db1ad4513d6205d2519e1a30ff4cef746e3243 ]

    The output of "perf annotate -l --stdio xxx" changed since commit 425859ff0de33
    ("perf annotate: No need to calculate notes->start twice") removed notes->start
    assignment in symbol__calc_lines(). It will get failed in
    find_address_in_section() from symbol__tty_annotate() subroutine as the
    a2l->addr is wrong. So the annotate summary doesn't report the line number of
    source code correctly.

    Before fix:

    liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ cat common_while_1.c
    void hotspot_1(void)
    {
    volatile int i;

    for (i = 0; i < 0x10000000; i++);
    for (i = 0; i < 0x10000000; i++);
    for (i = 0; i < 0x10000000; i++);
    }

    int main(void)
    {
    hotspot_1();

    return 0;
    }
    liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ gcc common_while_1.c -g -o common_while_1

    liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
    [ perf record: Woken up 2 times to write data ]
    [ perf record: Captured and wrote 0.488 MB perf.data (12498 samples) ]
    liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio

    Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
    ----------------------------------------------

    19.30 common_while_1[32]
    19.03 common_while_1[4e]
    19.01 common_while_1[16]
    5.04 common_while_1[13]
    4.99 common_while_1[4b]
    4.78 common_while_1[2c]
    4.77 common_while_1[10]
    4.66 common_while_1[2f]
    4.59 common_while_1[51]
    4.59 common_while_1[35]
    4.52 common_while_1[19]
    4.20 common_while_1[56]
    0.51 common_while_1[48]
    Percent | Source code & Disassembly of common_while_1 for cycles:ppp (12480 samples, percent: local period)
    -----------------------------------------------------------------------------------------------------------------
    :
    :
    :
    : Disassembly of section .text:
    :
    : 00000000000005fa :
    : hotspot_1():
    : void hotspot_1(void)
    : {
    0.00 : 5fa: push %rbp
    0.00 : 5fb: mov %rsp,%rbp
    : volatile int i;
    :
    : for (i = 0; i < 0x10000000; i++);
    0.00 : 5fe: movl $0x0,-0x4(%rbp)
    0.00 : 605: jmp 610
    0.00 : 607: mov -0x4(%rbp),%eax
    common_while_1[10] 4.77 : 60a: add $0x1,%eax
    common_while_1[13] 5.04 : 60d: mov %eax,-0x4(%rbp)
    common_while_1[16] 19.01 : 610: mov -0x4(%rbp),%eax
    common_while_1[19] 4.52 : 613: cmp $0xfffffff,%eax
    0.00 : 618: jle 607
    : for (i = 0; i < 0x10000000; i++);
    ...

    After fix:

    liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
    [ perf record: Woken up 2 times to write data ]
    [ perf record: Captured and wrote 0.488 MB perf.data (12500 samples) ]
    liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio

    Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
    ----------------------------------------------

    33.34 common_while_1.c:5
    33.34 common_while_1.c:6
    33.32 common_while_1.c:7
    Percent | Source code & Disassembly of common_while_1 for cycles:ppp (12482 samples, percent: local period)
    -----------------------------------------------------------------------------------------------------------------
    :
    :
    :
    : Disassembly of section .text:
    :
    : 00000000000005fa :
    : hotspot_1():
    : void hotspot_1(void)
    : {
    0.00 : 5fa: push %rbp
    0.00 : 5fb: mov %rsp,%rbp
    : volatile int i;
    :
    : for (i = 0; i < 0x10000000; i++);
    0.00 : 5fe: movl $0x0,-0x4(%rbp)
    0.00 : 605: jmp 610
    0.00 : 607: mov -0x4(%rbp),%eax
    common_while_1.c:5 4.70 : 60a: add $0x1,%eax
    4.89 : 60d: mov %eax,-0x4(%rbp)
    common_while_1.c:5 19.03 : 610: mov -0x4(%rbp),%eax
    common_while_1.c:5 4.72 : 613: cmp $0xfffffff,%eax
    0.00 : 618: jle 607
    : for (i = 0; i < 0x10000000; i++);
    0.00 : 61a: movl $0x0,-0x4(%rbp)
    0.00 : 621: jmp 62c
    0.00 : 623: mov -0x4(%rbp),%eax
    common_while_1.c:6 4.54 : 626: add $0x1,%eax
    4.73 : 629: mov %eax,-0x4(%rbp)
    common_while_1.c:6 19.54 : 62c: mov -0x4(%rbp),%eax
    common_while_1.c:6 4.54 : 62f: cmp $0xfffffff,%eax
    ...

    Signed-off-by: Wei Li
    Acked-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Jin Yao
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Fixes: 425859ff0de33 ("perf annotate: No need to calculate notes->start twice")
    Link: http://lkml.kernel.org/r/20190221095716.39529-1-liwei391@huawei.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Wei Li
     
  • [ Upstream commit e34c940245437f36d2c492edd1f8237eff391064 ]

    Ravi Bangoria reported that we fail with an empty NUMA node with the
    following message:

    $ lscpu
    NUMA node0 CPU(s):
    NUMA node1 CPU(s): 0-4

    $ sudo ./perf c2c report
    node/cpu topology bugFailed setup nodes

    Fix this by detecting the empty node and keeping its CPU set empty.

    Reported-by: Nageswara R Sastry
    Signed-off-by: Jiri Olsa
    Tested-by: Ravi Bangoria
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jonas Rabenstein
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20190305152536.21035-2-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Jiri Olsa
     

03 Apr, 2019

2 commits

  • commit f3b4e06b3bda759afd042d3d5fa86bea8f1fe278 upstream.

    A TSC packet can slip past MTC packets so that the timestamp appears to
    go backwards. One estimate is that can be up to about 40 CPU cycles,
    which is certainly less than 0x1000 TSC ticks, but accept slippage an
    order of magnitude more to be on the safe side.

    Signed-off-by: Adrian Hunter
    Cc: Jiri Olsa
    Cc: stable@vger.kernel.org
    Fixes: 79b58424b821c ("perf tools: Add Intel PT support for decoding MTC packets")
    Link: http://lkml.kernel.org/r/20190325135135.18348-1-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Adrian Hunter
     
  • commit e94d6b7f615e6dfbaf9fba7db6011db561461d0c upstream.

    Perf fails to parse uncore event alias, for example:

    # perf stat -e unc_m_clockticks -a --no-merge sleep 1
    event syntax error: 'unc_m_clockticks'
    \___ parser error

    Current code assumes that the event alias is from one specific PMU.

    To find the PMU, perf strcmps the PMU name of event alias with the real
    PMU name on the system.

    However, the uncore event alias may be from multiple PMUs with common
    prefix. The PMU name of uncore event alias is the common prefix.

    For example, UNC_M_CLOCKTICKS is clock event for iMC, which include 6
    PMUs with the same prefix "uncore_imc" on a skylake server.

    The real PMU names on the system for iMC are uncore_imc_0 ...
    uncore_imc_5.

    The strncmp is used to only check the common prefix for uncore event
    alias.

    With the patch:

    # perf stat -e unc_m_clockticks -a --no-merge sleep 1
    Performance counter stats for 'system wide':

    723,594,722 unc_m_clockticks [uncore_imc_5]
    724,001,954 unc_m_clockticks [uncore_imc_3]
    724,042,655 unc_m_clockticks [uncore_imc_1]
    724,161,001 unc_m_clockticks [uncore_imc_4]
    724,293,713 unc_m_clockticks [uncore_imc_2]
    724,340,901 unc_m_clockticks [uncore_imc_0]

    1.002090060 seconds time elapsed

    Signed-off-by: Kan Liang
    Acked-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: Thomas Richter
    Cc: stable@vger.kernel.org
    Fixes: ea1fa48c055f ("perf stat: Handle different PMU names with common prefix")
    Link: http://lkml.kernel.org/r/1552672814-156173-1-git-send-email-kan.liang@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Kan Liang
     

27 Mar, 2019

1 commit

  • commit eaeffeb9838a7c0dec981d258666bfcc0fa6a947 upstream.

    Since commit 4d99e4136580 ("perf machine: Workaround missing maps for
    x86 PTI entry trampolines"), perf tools has been creating more than one
    kernel map, however 'perf probe' assumed there could be only one.

    Fix by using machine__kernel_map() to get the main kernel map.

    Signed-off-by: Adrian Hunter
    Tested-by: Joseph Qi
    Acked-by: Masami Hiramatsu
    Cc: Alexander Shishkin
    Cc: Andy Lutomirski
    Cc: Greg Kroah-Hartman
    Cc: Jiufei Xue
    Cc: Peter Zijlstra
    Cc: stable@vger.kernel.org
    Cc: Xu Yu
    Fixes: 4d99e4136580 ("perf machine: Workaround missing maps for x86 PTI entry trampolines")
    Fixes: d83212d5dd67 ("kallsyms, x86: Export addresses of PTI entry trampolines")
    Link: http://lkml.kernel.org/r/2ed432de-e904-85d2-5c36-5897ddc5b23b@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Adrian Hunter
     

24 Mar, 2019

5 commits

  • commit 076333870c2f5bdd9b6d31e7ca1909cf0c84cbfa upstream.

    When TSC is not available, "timeless" decoding is used but a divide by
    zero occurs if perf_time_to_tsc() is called.

    Ensure the divisor is not zero.

    Signed-off-by: Adrian Hunter
    Cc: Jiri Olsa
    Cc: stable@vger.kernel.org # v4.9+
    Link: https://lkml.kernel.org/n/tip-1i4j0wqoc8vlbkcizqqxpsf4@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Adrian Hunter
     
  • commit 5a99d99e3310a565b0cf63f785b347be9ee0da45 upstream.

    Auxtrace records might have up to 7 bytes of padding appended. Adjust
    the overlap accordingly.

    Signed-off-by: Adrian Hunter
    Cc: Jiri Olsa
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20190206103947.15750-3-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Adrian Hunter
     
  • commit c3fcadf0bb765faf45d6d562246e1d08885466df upstream.

    Define auxtrace record alignment so that it can be referenced elsewhere.

    Note this is preparation for patch "perf intel-pt: Fix overlap calculation
    for padding"

    Signed-off-by: Adrian Hunter
    Cc: Jiri Olsa
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20190206103947.15750-2-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Adrian Hunter
     
  • commit d6d457451eb94fa747dc202765592eb8885a7352 upstream.

    Kallsyms symbols do not have a size, so the size becomes the distance to
    the next symbol.

    Consequently the recently added trampoline symbols end up with large
    sizes because the trampolines are some distance from one another and the
    main kernel map.

    However, symbols that end outside their map can disrupt the symbol tree
    because, after mapping, it can appear incorrectly that they overlap
    other symbols.

    Add logic to truncate symbol size to the end of the corresponding map.

    Signed-off-by: Adrian Hunter
    Acked-by: Jiri Olsa
    Cc: stable@vger.kernel.org
    Fixes: d83212d5dd67 ("kallsyms, x86: Export addresses of PTI entry trampolines")
    Link: http://lkml.kernel.org/r/20190109091835.5570-2-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Adrian Hunter
     
  • commit 03997612904866abe7cdcc992784ef65cb3a4b81 upstream.

    CYC packet timestamp calculation depends upon CBR which was being
    cleared upon overflow (OVF). That can cause errors due to failing to
    synchronize with sideband events. Even if a CBR change has been lost,
    the old CBR is still a better estimate than zero. So remove the clearing
    of CBR.

    Signed-off-by: Adrian Hunter
    Cc: Jiri Olsa
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20190206103947.15750-4-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Adrian Hunter
     

14 Mar, 2019

5 commits

  • [ Upstream commit 6ab3bc240ade47a0f52bc16d97edd9accbe0024e ]

    With a suitably defined "probe:vfs_getname" probe, 'perf trace' can
    "beautify" its output, so syscalls like open() or openat() can print the
    "filename" argument instead of just its hex address, like:

    $ perf trace -e open -- touch /dev/null
    [...]
    0.590 ( 0.014 ms): touch/18063 open(filename: /dev/null, flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
    [...]

    The output without such beautifier looks like:

    0.529 ( 0.011 ms): touch/18075 open(filename: 0xc78cf288, flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3

    However, when the vfs_getname probe expands to multiple probes and it is
    not the first one that is hit, the beautifier fails, as following:

    0.326 ( 0.010 ms): touch/18072 open(filename: , flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3

    Fix it by hooking into all the expanded probes (inlines), now, for instance:

    [root@quaco ~]# perf probe -l
    probe:vfs_getname (on getname_flags:73@fs/namei.c with pathname)
    probe:vfs_getname_1 (on getname_flags:73@fs/namei.c with pathname)
    [root@quaco ~]# perf trace -e open* sleep 1
    0.010 ( 0.005 ms): sleep/5588 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: RDONLY|CLOEXEC) = 3
    0.029 ( 0.006 ms): sleep/5588 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: RDONLY|CLOEXEC) = 3
    0.194 ( 0.008 ms): sleep/5588 openat(dfd: CWD, filename: /usr/lib/locale/locale-archive, flags: RDONLY|CLOEXEC) = 3
    [root@quaco ~]#

    Works, further verified with:

    [root@quaco ~]# perf test vfs
    65: Use vfs_getname probe to get syscall args filenames : Ok
    66: Add vfs_getname probe to get syscall args filenames : Ok
    67: Check open filename arg using perf trace + vfs_getname: Ok
    [root@quaco ~]#

    Reported-by: Michael Petlan
    Tested-by: Michael Petlan
    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-mv8kolk17xla1smvmp3qabv1@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Arnaldo Carvalho de Melo
     
  • [ Upstream commit 59a17706915fe5ea6f711e1f92d4fb706bce07fe ]

    When perf is built with the annobin plugin (RHEL8 build) extra symbols
    are added to its binary:

    # nm perf | grep annobin | head -10
    0000000000241100 t .annobin_annotate.c
    0000000000326490 t .annobin_annotate.c
    0000000000249255 t .annobin_annotate.c_end
    00000000003283a8 t .annobin_annotate.c_end
    00000000001bce18 t .annobin_annotate.c_end.hot
    00000000001bce18 t .annobin_annotate.c_end.hot
    00000000001bc3e2 t .annobin_annotate.c_end.unlikely
    00000000001bc400 t .annobin_annotate.c_end.unlikely
    00000000001bce18 t .annobin_annotate.c.hot
    00000000001bce18 t .annobin_annotate.c.hot
    ...

    Those symbols have no use for report or annotation and should be
    skipped. Moreover they interfere with the DWARF unwind test on the PPC
    arch, where they are mixed with checked symbols and then the test fails:

    # perf test dwarf -v
    59: Test dwarf unwind :
    --- start ---
    test child forked, pid 8515
    unwind: .annobin_dwarf_unwind.c:ip = 0x10dba40dc (0x2740dc)
    ...
    got: .annobin_dwarf_unwind.c 0x10dba40dc, expecting test__arch_unwind_sample
    unwind: failed with 'no error'

    The annobin symbols are defined as NOTYPE/LOCAL/HIDDEN:

    # readelf -s ./perf | grep annobin | head -1
    40: 00000000001bce4f 0 NOTYPE LOCAL HIDDEN 13 .annobin_init.c

    They can still pass the check for the label symbol. Adding check for
    HIDDEN and INTERNAL (as suggested by Nick below) visibility and filter
    out such symbols.

    > Just to be awkward, if you are going to ignore STV_HIDDEN
    > symbols then you should probably also ignore STV_INTERNAL ones
    > as well... Annobin does not generate them, but you never know,
    > one day some other tool might create some.

    Signed-off-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Masami Hiramatsu
    Cc: Michael Petlan
    Cc: Namhyung Kim
    Cc: Nick Clifton
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20190128133526.GD15461@krava
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Jiri Olsa
     
  • [ Upstream commit 8bf8c6da53c2265aea365a1de6038f118f522113 ]

    While updating perf to work with Python3 and Python2 I noticed that the
    stat-cpi script was dumping core.

    $ perf stat -e cycles,instructions record -o /tmp/perf.data /bin/false

    Performance counter stats for '/bin/false':

    802,148 cycles

    604,622 instructions 802,148 cycles
    604,622 instructions

    0.001445842 seconds time elapsed

    $ perf script -i /tmp/perf.data -s scripts/python/stat-cpi.py
    Segmentation fault (core dumped)
    ...
    ...
    rblist=rblist@entry=0xb2a200 ,
    new_entry=new_entry@entry=0x7ffcb755c310) at util/rblist.c:33
    ctx=, type=, create=,
    cpu=, evsel=) at util/stat-shadow.c:118
    ctx=, type=, st=)
    at util/stat-shadow.c:196
    count=count@entry=727442, cpu=cpu@entry=0, st=0xb2a200 )
    at util/stat-shadow.c:239
    config=config@entry=0xafeb40 ,
    counter=counter@entry=0x133c6e0) at util/stat.c:372
    ...
    ...

    The issue is that since 1fcd03946b52 perf_stat__update_shadow_stats now calls
    update_runtime_stat passing rt_stat rather than calling update_stats but
    perf_stat__init_shadow_stats has never been called to initialize rt_stat in
    the script path processing recorded stat data.

    Since I can't see any reason why perf_stat__init_shadow_stats() is presently
    initialized like it is in builtin-script.c::perf_sample__fprint_metric()
    [4bd1bef8bba2f] I'm proposing it instead be initialized once in __cmd_script

    Committer testing:

    After applying the patch:

    # perf script -i /tmp/perf.data -s tools/perf/scripts/python/stat-cpi.py
    0.001970: cpu -1, thread -1 -> cpi 1.709079 (1075684/629394)
    #

    No segfault.

    Signed-off-by: Tony Jones
    Reviewed-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Tested-by: Ravi Bangoria
    Cc: Andi Kleen
    Cc: Jin Yao
    Fixes: 1fcd03946b52 ("perf stat: Update per-thread shadow stats")
    Link: http://lkml.kernel.org/r/20190120191414.12925-1-tonyj@suse.de
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Tony Jones
     
  • [ Upstream commit 1497e804d1a6e2bd9107ddf64b0310449f4673eb ]

    This patch fixes an issue in cpumap.c when used with the TOPOLOGY
    header. In some configurations, some NUMA nodes may have no CPU (empty
    cpulist). Yet a cpumap map must be created otherwise perf abort with an
    error. This patch handles this case by creating a dummy map.

    Before:

    $ perf record -o - -e cycles noploop 2 | perf script -i -
    0x6e8 [0x6c]: failed to process type: 80

    After:

    $ perf record -o - -e cycles noploop 2 | perf script -i -
    noploop for 2 seconds

    Signed-off-by: Stephane Eranian
    Acked-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1547885559-1657-1-git-send-email-eranian@google.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Stephane Eranian
     
  • [ Upstream commit 96167167b6e17b25c0e05ecc31119b73baeab094 ]

    'perf script' crashes currently when printing mixed trace points and
    other events because the trace format does not handle events without
    trace meta data. Add a simple check to avoid that.

    % cat > test.c
    main()
    {
    printf("Hello world\n");
    }
    ^D
    % gcc -g -o test test.c
    % sudo perf probe -x test 'test.c:3'
    % perf record -e '{cpu/cpu-cycles,period=10000/,probe_test:main}:S' ./test
    % perf script

    Committer testing:

    Before:

    # perf probe -x /lib64/libc-2.28.so malloc
    Added new event:
    probe_libc:malloc (on malloc in /usr/lib64/libc-2.28.so)

    You can now use it in all perf tools, such as:

    perf record -e probe_libc:malloc -aR sleep 1

    # perf probe -l
    probe_libc:malloc (on __libc_malloc@malloc/malloc.c in /usr/lib64/libc-2.28.so)
    # perf record -e '{cpu/cpu-cycles,period=10000/,probe_libc:*}:S' sleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.023 MB perf.data (40 samples) ]
    # perf script
    Segmentation fault (core dumped)
    ^C
    #

    After:

    # perf script | head -6
    sleep 2888 94796.944981: 16198 cpu/cpu-cycles,period=10000/: ffffffff925dc04f get_random_u32+0x1f (/lib/modules/5.0.0-rc2+/build/vmlinux)
    sleep 2888 [-01] 94796.944981: probe_libc:malloc:
    sleep 2888 94796.944983: 4713 cpu/cpu-cycles,period=10000/: ffffffff922763af change_protection+0xcf (/lib/modules/5.0.0-rc2+/build/vmlinux)
    sleep 2888 [-01] 94796.944983: probe_libc:malloc:
    sleep 2888 94796.944986: 9934 cpu/cpu-cycles,period=10000/: ffffffff922777e0 move_page_tables+0x0 (/lib/modules/5.0.0-rc2+/build/vmlinux)
    sleep 2888 [-01] 94796.944986: probe_libc:malloc:
    #

    Signed-off-by: Andi Kleen
    Tested-by: Arnaldo Carvalho de Melo
    Acked-by: Jiri Olsa
    Link: http://lkml.kernel.org/r/20190117194834.21940-1-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Andi Kleen
     

20 Feb, 2019

2 commits

  • [ Upstream commit 03fa483821c0b4db7c2b1453d3332f397d82313f ]

    Some kernels, like 4.19.13-300.fc29.x86_64 in fedora 29, fail with the
    existing probe definition asking for the contents of result->name,
    working when we ask for the 'filename' variable instead, so add a
    fallback to that.

    Now those tests are back working on fedora 29 systems with that kernel:

    # perf test vfs_getname
    65: Use vfs_getname probe to get syscall args filenames : Ok
    66: Add vfs_getname probe to get syscall args filenames : Ok
    67: Check open filename arg using perf trace + vfs_getname: Ok
    #

    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-klt3n0i58dfqttveti09q3fi@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Arnaldo Carvalho de Melo
     
  • [ Upstream commit a3366db06bb656cef2e03f30f780d93059bcc594 ]

    By calculating the removed loops, we can get the iteration count.

    But the iteration count could be reported incorrectly, reporting
    impossibly high counts.

    That's because previous code uses the number of removed LBR entries for
    the iteration count. That's not good. Fix this by increasing the
    iteration count when a loop is detected.

    When matching the chain, the iteration count would be added up, finally we need
    to compute the average value when printing out.

    For example,

    $ perf report --branch-history --stdio --no-children

    Before:

    ---f2 +0
    |
    |--33.62%--f1 +9 (cycles:1)
    | f1 +0
    | main +22 (cycles:1)
    | main +17
    | main +38 (cycles:1)
    | main +27
    | f1 +26 (cycles:1)
    | f1 +24
    | f2 +27 (cycles:7)
    | f2 +0
    | f1 +19 (cycles:1)
    | f1 +14
    | f2 +27 (cycles:11)
    | f2 +0
    | f1 +9 (cycles:1 iter:2968 avg_cycles:3)
    | f1 +0
    | main +22 (cycles:1 iter:2968 avg_cycles:3)
    | main +17
    | main +38 (cycles:1 iter:2968 avg_cycles:3)

    2968 is an impossible high iteration count and avg_cycles is too small.

    After:

    ---f2 +0
    |
    |--33.62%--f1 +9 (cycles:1)
    | f1 +0
    | main +22 (cycles:1)
    | main +17
    | main +38 (cycles:1)
    | main +27
    | f1 +26 (cycles:1)
    | f1 +24
    | f2 +27 (cycles:7)
    | f2 +0
    | f1 +19 (cycles:1)
    | f1 +14
    | f2 +27 (cycles:11)
    | f2 +0
    | f1 +9 (cycles:1 iter:1 avg_cycles:23)
    | f1 +0
    | main +22 (cycles:1 iter:1 avg_cycles:23)
    | main +17
    | main +38 (cycles:1 iter:1 avg_cycles:23)

    avg_cycles:23 is the average cycles of this iteration.

    Fixes: c4ee06251d42 ("perf report: Calculate the average cycles of iterations")

    Signed-off-by: Jin Yao
    Cc: Alexander Shishkin
    Cc: Andi Kleen
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1546582230-17507-1-git-send-email-yao.jin@linux.intel.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Jin Yao
     

13 Feb, 2019

10 commits

  • commit 489338a717a0dfbbd5a3fabccf172b78f0ac9015 upstream.

    Notice that the use of the bitwise OR operator '|' always leads to true
    in this particular case, which seems a bit suspicious due to the context
    in which this expression is being used.

    Fix this by using bitwise AND operator '&' instead.

    This bug was detected with the help of Coccinelle.

    Signed-off-by: Gustavo A. R. Silva
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: stable@vger.kernel.org
    Fixes: 6a6cd11d4e57 ("perf test: Add test for the sched tracepoint format fields")
    Link: http://lkml.kernel.org/r/20190122233439.GA5868@embeddedor
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Gustavo A. R. Silva
     
  • [ Upstream commit a389aece97938966616ce0336466b98b0351ef10 ]

    Ondřej reported that when compiled with python3, the python extension
    regresses in evlist.get_pollfd function behaviour.

    The evlist.get_pollfd function creates file objects from evlist's fds
    and returns them in a list. The python3 version also sets them to 'close
    the original descriptor' when the object dies (is closed), by passing
    True via the 'closefd' arg in the PyFile_FromFd call.

    The python's closefd doc says:

    If closefd is False, the underlying file descriptor will be kept open
    when the file is closed.

    That's why the following line in python3 closes all evlist fds:

    evlist.get_pollfd()

    the returned list is immediately destroyed and that takes down the
    original events fds.

    Passing closefd as False to PyFile_FromFd to fix this.

    Reported-by: Ondřej Lysoněk
    Signed-off-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Jaroslav Škarvada
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Fixes: 66dfdff03d19 ("perf tools: Add Python 3 support")
    Link: http://lkml.kernel.org/r/20181226112121.5285-1-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Jiri Olsa
     
  • [ Upstream commit 14541b1e7e723859ff2c75c6fc10cdbbec6b8c34 ]

    Current libbfd feature test unconditionally links against -liberty and -lz.
    While it's required on some systems (e.g. opensuse), it's completely
    unnecessary on the others, where only -lbdf is sufficient (debian).
    This patch streamlines (and renames) the following feature checks:

    feature-libbfd - only link against -lbfd (debian),
    see commit 2cf9040714f3 ("perf tools: Fix bfd
    dependency libraries detection")
    feature-libbfd-liberty - link against -lbfd and -liberty
    feature-libbfd-liberty-z - link against -lbfd, -liberty and -lz (opensuse),
    see commit 280e7c48c3b8 ("perf tools: fix BFD
    detection on opensuse")

    (feature-liberty{,-z} were renamed to feature-libbfd-liberty{,z}
    for clarity)

    The main motivation is to fix this feature test for bpftool which is
    currently broken on debian (libbfd feature shows OFF, but we still
    unconditionally link against -lbfd and it works).

    Tested on debian with only -lbfd installed (without -liberty); I'd
    appreciate if somebody on the other systems can test this new detection
    method.

    Signed-off-by: Stanislav Fomichev
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Mathieu Poirier
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/4dfc634cfcfb236883971b5107cf3c28ec8a31be.1542328222.git.sdf@google.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Stanislav Fomichev
     
  • [ Upstream commit 866053bb644f754d1a93aaa9db9998fecf7a8978 ]

    To avoid this warning:

    CC /tmp/build/perf/util/s390-cpumsf.o
    util/s390-cpumsf.c: In function 's390_cpumsf_samples':
    util/s390-cpumsf.c:508:3: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 4 has type 'off_t' [-Wformat=]
    pr_err("[%#08" PRIx64 "] Invalid AUX trailer entry TOD clock base\n",
    ^

    Now the various Android cross toolchains used in the perf tools
    container test builds are all clean and we can remove this:

    export EXTRA_MAKE_ARGS="WERROR=0"

    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Thomas Richter
    Link: https://lkml.kernel.org/n/tip-5rav4ccyb0sjciysz2i4p3sx@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Arnaldo Carvalho de Melo
     
  • [ Upstream commit 0afcf29bab35d3785204cd9bd51693b231ad7181 ]

    Reducing this noise when cross building to the Android NDK:

    util/header.c: In function 'perf_header__fprintf_info':
    util/header.c:2710:45: warning: pointer targets in passing argument 1 of 'ctime' differ in signedness [-Wpointer-sign]
    fprintf(fp, "# captured on : %s", ctime(&st.st_ctime));
    ^
    In file included from util/../perf.h:5:0,
    from util/evlist.h:11,
    from util/header.c:22:
    /opt/android-ndk-r15c/platforms/android-26/arch-arm/usr/include/time.h:81:14: note: expected 'const time_t *' but argument is of type 'long unsigned int *'
    extern char* ctime(const time_t*) __LIBC_ABI_PUBLIC__;
    ^

    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-6bz74zp080yhmtiwb36enso9@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Arnaldo Carvalho de Melo
     
  • [ Upstream commit bef0b8970f27da5ca223e522a174d03e2587761d ]

    The strncpy() function may leave the destination string buffer
    unterminated, better use strlcpy() that we have a __weak fallback
    implementation for systems without it.

    In this case the 'target' buffer is coming from a list of build-ids that
    are expected to have a len of at most (SBUILD_ID_SIZE - 1) chars, so
    probably we're safe, but since we're using strncpy() here, use strlcpy()
    instead to provide the intended safety checking without the using the
    problematic strncpy() function.

    This fixes this warning on an Alpine Linux Edge system with gcc 8.2:

    util/probe-file.c: In function 'probe_cache__open.isra.5':
    util/probe-file.c:427:3: error: 'strncpy' specified bound 41 equals destination size [-Werror=stringop-truncation]
    strncpy(sbuildid, target, SBUILD_ID_SIZE);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cc1: all warnings being treated as errors

    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Masami Hiramatsu
    Cc: Namhyung Kim
    Fixes: 1f3736c9c833 ("perf probe: Show all cached probes")
    Link: https://lkml.kernel.org/n/tip-l7n8ggc9kl38qtdlouke5yp5@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Arnaldo Carvalho de Melo
     
  • [ Upstream commit 7572588085a13d5db02bf159542189f52fdb507e ]

    The strncpy() function may leave the destination string buffer
    unterminated, better use strlcpy() that we have a __weak fallback
    implementation for systems without it.

    This fixes this warning on an Alpine Linux Edge system with gcc 8.2:

    util/header.c: In function 'perf_event__synthesize_event_update_unit':
    util/header.c:3586:2: error: 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation]
    strncpy(ev->data, evsel->unit, size);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    util/header.c:3579:16: note: length computed here
    size_t size = strlen(evsel->unit);
    ^~~~~~~~~~~~~~~~~~~

    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Fixes: a6e5281780d1 ("perf tools: Add event_update event unit type")
    Link: https://lkml.kernel.org/n/tip-fiikh5nay70bv4zskw2aa858@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Arnaldo Carvalho de Melo
     
  • [ Upstream commit fca5085c15255bbde203b7322c15f07ebb12f63e ]

    The strncpy() function may leave the destination string buffer
    unterminated, better use strlcpy() that we have a __weak fallback
    implementation for systems without it.

    This fixes this warning on an Alpine Linux Edge system with gcc 8.2:

    In function 'decompress_kmodule',
    inlined from 'dso__decompress_kmodule_fd' at util/dso.c:305:9:
    util/dso.c:298:3: error: 'strncpy' destination unchanged after copying no bytes [-Werror=stringop-truncation]
    strncpy(pathname, tmpbuf, len);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    CC /tmp/build/perf/util/values.o
    CC /tmp/build/perf/util/debug.o
    cc1: all warnings being treated as errors

    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Fixes: c9a8a6131fb6 ("perf tools: Move the temp file processing into decompress_kmodule")
    Link: https://lkml.kernel.org/n/tip-tl2hdxj64tt4k8btbi6a0ugw@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Arnaldo Carvalho de Melo
     
  • [ Upstream commit 741dad88dde296999da30332157ca47f0543747d ]

    Fix inconsistent use of tabs and spaces error:

    # perf test 16 -v
    16: Setup struct perf_event_attr :
    --- start ---
    test child forked, pid 20224
    File "/usr/libexec/perf-core/tests/attr.py", line 119
    log.warning("expected %s=%s, got %s" % (t, self[t], other[t]))
    ^
    TabError: inconsistent use of tabs and spaces in indentation
    test child finished with -1
    ---- end ----
    Setup struct perf_event_attr: FAILED!

    Signed-off-by: Adrian Hunter
    Cc: Jiri Olsa
    Link: http://lkml.kernel.org/r/20181122140456.16817-1-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Adrian Hunter
     
  • [ Upstream commit 4787eff3fa88f62fede6ed7afa06477ae6bf984d ]

    The tool perf is useful for the performance analysis on the Hygon Dhyana
    platform. But right now there is no Hygon support for it to analyze the
    KVM guest os data. So add Hygon Dhyana support to it by checking vendor
    string to share the code path of AMD.

    Signed-off-by: Pu Wen
    Acked-by: Borislav Petkov
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1542008451-31735-1-git-send-email-puwen@hygon.cn
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Pu Wen
     

26 Jan, 2019

8 commits

  • [ Upstream commit d7a8c4a6a055097a67ccfa3ca7c9ff1b64603a70 ]

    There are systems such as the Android NDK API level 24 has the
    open_memstream() function but doesn't provide a prototype, adding noise
    to the build:

    builtin-timechart.c: In function 'cat_backtrace':
    builtin-timechart.c:486:2: warning: implicit declaration of function 'open_memstream' [-Wimplicit-function-declaration]
    FILE *f = open_memstream(&p, &p_len);
    ^
    builtin-timechart.c:486:2: warning: nested extern declaration of 'open_memstream' [-Wnested-externs]
    builtin-timechart.c:486:12: warning: initialization makes pointer from integer without a cast
    FILE *f = open_memstream(&p, &p_len);
    ^

    Define a LACKS_OPEN_MEMSTREAM_PROTOTYPE define so that code needing that
    can get a prototype.

    Checked in the bionic git repo to be available since level 23:

    https://android.googlesource.com/platform/bionic/+/master/libc/include/stdio.h#241

    FILE* open_memstream(char** __ptr, size_t* __size_ptr) __INTRODUCED_IN(23);

    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-343ashae97e5bq6vizusyfno@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Arnaldo Carvalho de Melo
     
  • [ Upstream commit 748fe0889c1ff12d378946bd5326e8ee8eacf5cf ]

    There are systems such as the Android NDK API level 24 has the
    sigqueue() function but doesn't provide a prototype, adding noise to the
    build:

    util/evlist.c: In function 'perf_evlist__prepare_workload':
    util/evlist.c:1494:4: warning: implicit declaration of function 'sigqueue' [-Wimplicit-function-declaration]
    if (sigqueue(getppid(), SIGUSR1, val))
    ^
    util/evlist.c:1494:4: warning: nested extern declaration of 'sigqueue' [-Wnested-externs]

    Define a LACKS_SIGQUEUE_PROTOTYPE define so that code needing that can
    get a prototype.

    Checked in the bionic git repo to be available since level 23:

    https://android.googlesource.com/platform/bionic/+/master/libc/include/signal.h#123

    int sigqueue(pid_t __pid, int __signal, const union sigval __value) __INTRODUCED_IN(23);

    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: https://lkml.kernel.org/n/tip-lmhpev1uni9kdrv7j29glyov@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Arnaldo Carvalho de Melo
     
  • [ Upstream commit 43fd56669c28cd354e9228bdb58e4bca1c1a8b66 ]

    The structure cs_etm_queue uses 'prev_packet' to point to previous
    packet, this can be used to combine with new coming packet to generate
    samples.

    In function cs_etm__flush() it swaps packets only when the flag
    'etm->synth_opts.last_branch' is true, this means that it will not swap
    packets if without option '--itrace=il' to generate last branch entries;
    thus for this case the 'prev_packet' doesn't point to the correct
    previous packet and the stale packet still will be used to generate
    sequential sample. Thus if dump trace with 'perf script' command we can
    see the incorrect flow with the stale packet's address info.

    This patch corrects packets swapping in cs_etm__flush(); except using
    the flag 'etm->synth_opts.last_branch' it also checks the another flag
    'etm->sample_branches', if any flag is true then it swaps packets so can
    save correct content to 'prev_packet'. Finally this can fix the wrong
    program flow dumping issue.

    The patch has a minor refactoring to use 'etm->synth_opts.last_branch'
    instead of 'etmq->etm->synth_opts.last_branch' for condition checking,
    this is consistent with that is done in cs_etm__sample().

    Signed-off-by: Leo Yan
    Reviewed-by: Mathieu Poirier
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Mike Leach
    Cc: Namhyung Kim
    Cc: Robert Walker
    Cc: coresight@lists.linaro.org
    Cc: linux-arm-kernel@lists.infradead.org
    Link: http://lkml.kernel.org/r/1544513908-16805-2-git-send-email-leo.yan@linaro.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Leo Yan
     
  • [ Upstream commit 51433ead1460fb3f46e1c34f68bb22fd2dd0f5d0 ]

    Some 'perf stat' options do not make sense to be negated (event,
    cgroup), some do not have negated path implemented (metrics). Due to
    that, it is better to disable the "no-" prefix for them, since
    otherwise, the later opt-parsing segfaults.

    Before:

    $ perf stat --no-metrics -- ls
    Segmentation fault (core dumped)

    After:

    $ perf stat --no-metrics -- ls
    Error: option `no-metrics' isn't available
    Usage: perf stat [] []

    Signed-off-by: Michael Petlan
    Tested-by: Arnaldo Carvalho de Melo
    LPU-Reference: 1485912065.62416880.1544457604340.JavaMail.zimbra@redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Michael Petlan
     
  • [ Upstream commit 91b2b97025097ce7ca7536bc87eba2bf14760fb4 ]

    Fix incorrect event names for the Load_Miss_Real_Latency metric for
    Skylake and Skylake Server.

    Fixes https://github.com/andikleen/pmu-tools/issues/158

    Before:

    % perf stat -M Load_Miss_Real_Latency true
    event syntax error: '..ss.pending,mem_load_retired.l1_miss_ps,mem_load_retired.fb_hit_ps}:W'
    \___ parser error

    Usage: perf stat [] []

    -M, --metrics
    monitor specified metrics or metric groups (separated by ,)

    After:

    % perf stat -M Load_Miss_Real_Latency true

    Performance counter stats for 'true':

    279,204 l1d_pend_miss.pending # 14.0 Load_Miss_Real_Latency
    4,784 mem_load_uops_retired.l1_miss
    15,188 mem_load_uops_retired.hit_lfb

    0.000899640 seconds time elapsed

    Signed-off-by: Andi Kleen
    Acked-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Link: http://lkml.kernel.org/r/20181120050635.4215-1-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Andi Kleen
     
  • [ Upstream commit bd8d57fb7e25e9fcf67a9eef5fa13aabe2016e07 ]

    The strncpy() function may leave the destination string buffer
    unterminated, better use strlcpy() that we have a __weak fallback
    implementation for systems without it.

    This fixes this warning on an Alpine Linux Edge system with gcc 8.2:

    util/parse-events.c: In function 'print_symbol_events':
    util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation]
    strncpy(name, syms->symbol, MAX_NAME_LEN);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    In function 'print_symbol_events.constprop',
    inlined from 'print_events' at util/parse-events.c:2508:2:
    util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation]
    strncpy(name, syms->symbol, MAX_NAME_LEN);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    In function 'print_symbol_events.constprop',
    inlined from 'print_events' at util/parse-events.c:2511:2:
    util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation]
    strncpy(name, syms->symbol, MAX_NAME_LEN);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cc1: all warnings being treated as errors

    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Fixes: 947b4ad1d198 ("perf list: Fix max event string size")
    Link: https://lkml.kernel.org/n/tip-b663e33bm6x8hrkie4uxh7u2@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Arnaldo Carvalho de Melo
     
  • [ Upstream commit 2f5302533f306d5ee87bd375aef9ca35b91762cb ]

    The strncpy() function may leave the destination string buffer
    unterminated, better use strlcpy() that we have a __weak fallback
    implementation for systems without it.

    In this specific case this would only happen if fgets() was buggy, as
    its man page states that it should read one less byte than the size of
    the destination buffer, so that it can put the nul byte at the end of
    it, so it would never copy 255 non-nul chars, as fgets reads into the
    orig buffer at most 254 non-nul chars and terminates it. But lets just
    switch to strlcpy to keep the original intent and silence the gcc 8.2
    warning.

    This fixes this warning on an Alpine Linux Edge system with gcc 8.2:

    In function 'cpu_model',
    inlined from 'svg_cpu_box' at util/svghelper.c:378:2:
    util/svghelper.c:337:5: error: 'strncpy' output may be truncated copying 255 bytes from a string of length 255 [-Werror=stringop-truncation]
    strncpy(cpu_m, &buf[13], 255);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    Cc: Adrian Hunter
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Arjan van de Ven
    Fixes: f48d55ce7871 ("perf: Add a SVG helper library file")
    Link: https://lkml.kernel.org/n/tip-xzkoo0gyr56gej39ltivuh9g@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Arnaldo Carvalho de Melo
     
  • [ Upstream commit 24f967337f6d6bce931425769c0f5ff5cf2d212e ]

    The breakpoint tests on the ARM 32-bit kernel are broken in several
    ways.

    The breakpoint length requested does not necessarily match whether the
    function address has the Thumb bit (bit 0) set or not, and this does
    matter to the ARM kernel hw_breakpoint infrastructure. See [1] for
    background.

    [1]: https://lkml.org/lkml/2018/11/15/205

    As Will indicated, the overflow handling would require single-stepping
    which is not supported at the moment. Just disable those tests for the
    ARM 32-bit platforms and update the comment above to explain these
    limitations.

    Co-developed-by: Will Deacon
    Signed-off-by: Florian Fainelli
    Signed-off-by: Will Deacon
    Acked-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20181203191138.2419-1-f.fainelli@gmail.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Sasha Levin

    Florian Fainelli