16 Dec, 2017

1 commit

  • Pull x86 fixes from Ingo Molnar:
    "Misc fixes:

    - fix the s2ram regression related to confusion around segment
    register restoration, plus related cleanups that make the code more
    robust

    - a guess-unwinder Kconfig dependency fix

    - an isoimage build target fix for certain tool chain combinations

    - instruction decoder opcode map fixes+updates, and the syncing of
    the kernel decoder headers to the objtool headers

    - a kmmio tracing fix

    - two 5-level paging related fixes

    - a topology enumeration fix on certain SMP systems"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    objtool: Resync objtool's instruction decoder source code copy with the kernel's latest version
    x86/decoder: Fix and update the opcodes map
    x86/power: Make restore_processor_context() sane
    x86/power/32: Move SYSENTER MSR restoration to fix_processor_context()
    x86/power/64: Use struct desc_ptr for the IDT in struct saved_context
    x86/unwinder/guess: Prevent using CONFIG_UNWINDER_GUESS=y with CONFIG_STACKDEPOT=y
    x86/build: Don't verify mtools configuration file for isoimage
    x86/mm/kmmio: Fix mmiotrace for page unaligned addresses
    x86/boot/compressed/64: Print error if 5-level paging is not supported
    x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
    x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation

    Linus Torvalds
     

15 Dec, 2017

1 commit

  • Update x86-opcode-map.txt based on the October 2017 Intel SDM publication.
    Fix INVPID to INVVPID.
    Add UD0 and UD1 instruction opcodes.

    Also sync the objtool and perf tooling copies of this file.

    Signed-off-by: Randy Dunlap
    Acked-by: Masami Hiramatsu
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/aac062d7-c0f6-96e3-5c92-ed299e2bd3da@infradead.org
    Signed-off-by: Ingo Molnar

    Randy Dunlap
     

12 Dec, 2017

1 commit

  • Recently there was a treewide conversion of ACCESS_ONCE() to
    {READ,WRITE}_ONCE(), but a new use was introduced concurrently by
    commit:

    1695849735752d2a ("perf mmap: Move perf_mmap and methods to separate mmap.[ch] files")

    Let's convert this over to READ_ONCE() so that we can remove the
    ACCESS_ONCE() definitions in subsequent patches.

    Tested-by: Paul E. McKenney
    Signed-off-by: Mark Rutland
    Reviewed-by: Paul E. McKenney
    Cc: Arnaldo Carvalho de Melo
    Cc: Joe Perches
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: apw@canonical.com
    Link: http://lkml.kernel.org/r/20171127103824.36526-2-mark.rutland@arm.com
    Signed-off-by: Ingo Molnar

    Mark Rutland
     

09 Dec, 2017

1 commit

  • Pull networking fixes from David Miller:

    1) CAN fixes from Martin Kelly (cancel URBs properly in all the CAN usb
    drivers).

    2) Revert returning -EEXIST from __dev_alloc_name() as this propagates
    to userspace and broke some apps. From Johannes Berg.

    3) Fix conn memory leaks and crashes in TIPC, from Jon Malloc and Cong
    Wang.

    4) Gianfar MAC can't do EEE so don't advertise it by default, from
    Claudiu Manoil.

    5) Relax strict netlink attribute validation, but emit a warning. From
    David Ahern.

    6) Fix regression in checksum offload of thunderx driver, from Florian
    Westphal.

    7) Fix UAPI bpf issues on s390, from Hendrik Brueckner.

    8) New card support in iwlwifi, from Ihab Zhaika.

    9) BBR congestion control bug fixes from Neal Cardwell.

    10) Fix port stats in nfp driver, from Pieter Jansen van Vuuren.

    11) Fix leaks in qualcomm rmnet, from Subash Abhinov Kasiviswanathan.

    12) Fix DMA API handling in sh_eth driver, from Thomas Petazzoni.

    13) Fix spurious netpoll warnings in bnxt_en, from Calvin Owens.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
    net: mvpp2: fix the RSS table entry offset
    tcp: evaluate packet losses upon RTT change
    tcp: fix off-by-one bug in RACK
    tcp: always evaluate losses in RACK upon undo
    tcp: correctly test congestion state in RACK
    bnxt_en: Fix sources of spurious netpoll warnings
    tcp_bbr: reset long-term bandwidth sampling on loss recovery undo
    tcp_bbr: reset full pipe detection on loss recovery undo
    tcp_bbr: record "full bw reached" decision in new full_bw_reached bit
    sfc: pass valid pointers from efx_enqueue_unwind
    gianfar: Disable EEE autoneg by default
    tcp: invalidate rate samples during SACK reneging
    can: peak/pcie_fd: fix potential bug in restarting tx queue
    can: usb_8dev: cancel urb on -EPIPE and -EPROTO
    can: kvaser_usb: cancel urb on -EPIPE and -EPROTO
    can: esd_usb2: cancel urb on -EPIPE and -EPROTO
    can: ems_usb: cancel urb on -EPIPE and -EPROTO
    can: mcba_usb: cancel urb on -EPROTO
    usbnet: fix alignment for frames with no ethernet header
    tcp: use current time in tcp_rcv_space_adjust()
    ...

    Linus Torvalds
     

07 Dec, 2017

2 commits


05 Dec, 2017

1 commit

  • The regs_query_register_offset() helper function converts
    register name like "%r0" to an offset of a register in user_pt_regs
    It is required by the BPF prologue generator.

    The user_pt_regs structure was recently added to "asm/ptrace.h".
    Hence, update tools/perf/check-headers.sh to keep the header file
    in sync with kernel changes.

    Suggested-by: Thomas Richter
    Signed-off-by: Hendrik Brueckner
    Reviewed-and-tested-by: Thomas Richter
    Acked-by: Alexei Starovoitov
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Heiko Carstens
    Signed-off-by: Daniel Borkmann

    Hendrik Brueckner
     

29 Nov, 2017

21 commits

  • To add support for the MAP_SYNC flag introduced in:

    b6fb293f2497 ("mm: Define MAP_SYNC and VM_SYNC flags")

    Update tools/perf/trace/beauty/mmap.c to support that flag.

    This silences this perf build warning:

    Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman.h' differs from latest version at 'include/uapi/asm-generic/mman.h'

    Cc: Adrian Hunter
    Cc: Dan Williams
    Cc: David Ahern
    Cc: Jan Kara
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: https://lkml.kernel.org/n/tip-14zyk3iywrj37c7g1eagmzbo@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • There are just a few new defines which do not affect perf tools.

    Signed-off-by: Adrian Hunter
    Link: http://lkml.kernel.org/r/1511253326-22308-3-git-send-email-adrian.hunter@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Adrian Hunter
     
  • Test case 21 (Number of exit events of a simple workload) fails on
    s390x. The reason is the invalid sample frequency supplied for this
    test. On s390x the minimum sample frequency is much higher (see output
    of /proc/service_levels).

    Supply a save sample frequency value for s390x to fix this. The value
    will be adjusted by the s390x CPUMF frequency convertion function to a
    value well below the sysctl kernel.perf_event_max_sample_rate value.

    Signed-off-by: Thomas Richter
    Reviewed-by: Hendrik Brueckner
    Cc: Martin Schwidefsky
    LPU-Reference: 20171123114611.93397-1-tmricht@linux.vnet.ibm.com
    Link: https://lkml.kernel.org/n/tip-1ynblyhi1n81idpido59nt1y@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Thomas Richter
     
  • Certain systems are designed to have sparse/discontiguous nodes. On
    such systems, 'perf bench numa' hangs, shows wrong number of nodes and
    shows values for non-existent nodes. Handle this by only taking nodes
    that are exposed by kernel to userspace.

    Signed-off-by: Satheesh Rajendran
    Reviewed-by: Srikar Dronamraju
    Acked-by: Naveen N. Rao
    Link: http://lkml.kernel.org/r/1edbcd353c009e109e93d78f2f46381930c340fe.1511368645.git.sathnaga@linux.vnet.ibm.com
    Signed-off-by: Balamuruhan S
    Signed-off-by: Arnaldo Carvalho de Melo

    Satheesh Rajendran
     
  • There's no need for SA_SIGINFO data in SIGWINCH handler, switching it to
    register the handler via signal interface as we do for the rest of the
    signals in perf top.

    Signed-off-by: Jiri Olsa
    Tested-by: Ravi Bangoria
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-elxp1vdnaog1scaj13cx7cu0@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • The stdio perf top crashes when we change the terminal
    window size. The reason is that we assumed we get the
    perf_top pointer as a signal handler argument which is
    not the case.

    Changing the SIGWINCH handler logic to change global
    resize variable, which is checked in the main thread
    loop.

    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Tested-by: Ravi Bangoria
    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-ysuzwz77oev1ftgvdscn9bpu@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • If all events have attr.exclude_kernel set, no need to look at
    kptr_restrict.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-yegpzg5bf2im69g0tfizqaqz@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • If we're not sampling the kernel, we shouldn't care about kptr_restrict
    neither synthesize anything for assisting in resolving kernel samples,
    like the reference relocation symbol or kernel modules information.

    Before:

    $ cat /proc/sys/kernel/kptr_restrict /proc/sys/kernel/perf_event_paranoid
    2
    2
    $ perf record sleep 1
    WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
    check /proc/sys/kernel/kptr_restrict.

    Samples in kernel functions may not be resolved if a suitable vmlinux
    file is not found in the buildid cache or in the vmlinux path.

    Samples in kernel modules won't be resolved at all.

    If some relocation was applied (e.g. kexec) symbols may be misresolved
    even with a suitable vmlinux or kallsyms file.

    Couldn't record kernel reference relocation symbol
    Symbol resolution may be skewed if relocation was used (e.g. kexec).
    Check /proc/kallsyms permission or run as root.
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.001 MB perf.data (8 samples) ]
    $ perf evlist -v
    cycles:uppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
    $

    After:

    $ perf record sleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.001 MB perf.data (10 samples) ]
    $

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-t025e9zftbx2b8cq2w01g5e5@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • If none of the evsels has attr.exclude_kernel set to zero, no kernel
    samples, so no point in warning the user about problems in processing
    kernel samples, as there will be none.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-7dn926v3at8txxkky92aesz2@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • The warning about kptr_restrict needs to be emitted only when it is set
    and we ask for kernel space samples, so add a helper to help with that.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-fh7drty6yljei9gxxzer6eup@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • The 'perf test' case "probe libc's inet_pton & backtrace it with ping"
    fails on s390x. The reason is the 'realpath /lib64/ld*.so.* | uniq' line
    which returns 2 libraries:

    root@s35lp76 shell]# realpath /lib64/ld*.so.* | uniq
    /usr/lib64/ld-2.26.so
    /usr/lib64/ld_pre_smc.so.1.0.1
    [root@s35lp76 shell]

    This output makes the "perf probe" command lines invalid.

    Use ldd tool to find out the libraries required by "bash" and check if
    symbol "inet_pton" is part of the "libc" library. Some distros do not
    have a /lib64 directory.

    I have also added a check for the existence of an IPv6 network interface
    before it is being used.

    Committer changes:

    We can't really use ldd for libc, as in some systems, such as x86_64, it
    has hardlinks and then ldd sees one and the kernel the other, so grep
    for libc in /proc/self/maps to get the one we'll receive from
    PERF_RECORD_MMAP.

    Thomas checked this change and acked it.

    Signed-off-by: Thomas-Mich Richter
    Tested-by: Arnaldo Carvalho de Melo
    Suggested-by: Hendrik Brückner
    Reviewed-by: Hendrik Brückner
    Link: http://lkml.kernel.org/r/20171114133409.GN8836@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Thomas Richter
     
  • This 'perf test' case fails on s390x. The 'touch' command on s390x uses
    the 'openat' system call to open the file named on the command line:

    [root@s35lp76 perf]# perf probe -l
    probe:vfs_getname (on getname_flags:72@fs/namei.c with pathname)
    [root@s35lp76 perf]# perf trace -e open touch /tmp/abc
    0.400 ( 0.015 ms): touch/27542 open(filename:
    /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
    [root@s35lp76 perf]#

    There is no 'open' system call for file '/tmp/abc'. Instead the 'openat'
    system call is used:

    [root@s35lp76 perf]# strace touch /tmp/abc
    execve("/usr/bin/touch", ["touch", "/tmp/abc"], 0x3ffd547ec98
    /* 30 vars */) = 0
    [...]
    openat(AT_FDCWD, "/tmp/abc", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3
    [...]

    On s390x the 'egrep' command does not find a matching pattern and
    returns an error.

    Fix this for s390x create a platform dependent command line to enable
    the 'perf probe' call to listen to the 'openat' system call and get the
    expected output.

    Signed-off-by: Thomas-Mich Richter
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Hendrik Brueckner
    Cc: Thomas-Mich Richter
    LPU-Reference: 20171114071847.2381-1-tmricht@linux.vnet.ibm.com
    Link: http://lkml.kernel.org/n/tip-3qf38jk0prz54rhmhyu871my@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Thomas Richter
     
  • There are many instructions, esp on PowerPC, whose mnemonics are longer
    than 6 characters. Using precision limit causes truncation of such
    mnemonics.

    Fix this by removing precision limit. Note that, 'width' is still 6, so
    alignment won't get affected for length
    Signed-off-by: Ravi Bangoria
    Cc: Alexander Shishkin
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Taeung Song
    Link: http://lkml.kernel.org/r/20171114032540.4564-1-ravi.bangoria@linux.vnet.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Ravi Bangoria
     
  • The commit 8e99b6d4533c changed prefixcmp() to strstart() but missed to
    change the return value in some place. It makes perf help print
    annoying output even for sane config items like below:

    $ perf help
    '.root': unsupported man viewer sub key.
    ...

    Reported-by: Arnaldo Carvalho de Melo
    Signed-off-by: Namhyung Kim
    Tested-by: Arnaldo Carvalho de Melo
    Tested-by: Taeung Song
    Cc: Jiri Olsa
    Cc: Sihyeon Jang
    Cc: kernel-team@lge.com
    Link: http://lkml.kernel.org/r/20171114001542.GA16464@sejong
    Fixes: 8e99b6d4533c ("tools include: Adopt strstarts() from the kernel")
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • A recent fix for 'perf trace' introduced a bug where
    machine__exit(trace->host) could be called while trace->host was still
    NULL, so make this more robust by guarding against NULL, just like
    free() does.

    The problem happens, for instance, when !root users try to run 'perf
    trace':

    [acme@jouet linux]$ trace
    Error: No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit)
    Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing'

    perf: Segmentation fault
    Obtained 7 stack frames.
    [0x4f1b2e]
    /lib64/libc.so.6(+0x3671f) [0x7f43a1dd971f]
    [0x4f3fec]
    [0x47468b]
    [0x42a2db]
    /lib64/libc.so.6(__libc_start_main+0xe9) [0x7f43a1dc3509]
    [0x42a6c9]
    Segmentation fault (core dumped)
    [acme@jouet linux]$

    Cc: Adrian Hunter
    Cc: Alexander Shishkin
    Cc: Andrei Vagin
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Vasily Averin
    Cc: Wang Nan
    Fixes: 33974a414ce2 ("perf trace: Call machine__exit() at exit")
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • When processing PERF_RECORD_AUXTRACE_INFO several perf_evsel entries
    will be synthesized and inserted into session->evlist, eventually ending
    in perf_script.tool.sample(), which ends up calling builtin-script.c's
    process_event(), that expects evsel->priv to be a perf_evsel_script
    object with a valid FILE pointer in fp.

    So we need to intercept the processing of PERF_RECORD_AUXTRACE_INFO and
    then setup evsel->priv for these newly created perf_evsel instances, do
    it to fix the segfault in process_event() trying to use a NULL for that
    FILE pointer.

    Reported-by: Alexander Shishkin
    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Ravi Bangoria
    Cc: Wang Nan
    Cc: yuzhoujian
    Fixes: a14390fde64e ("perf script: Allow creating per-event dump files")
    Link: http://lkml.kernel.org/n/tip-bthnur8r8de01gxvn2qayx6e@git.kernel.org
    [ Merge fix by Ravi Bangoria before pushing upstream to preserv bisectability ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • I forgot one conversion, which got noticed by Thomas when running:

    $ perf stat -e '{cpu-clock,instructions}' kill
    kill: not enough arguments
    Segmentation fault (core dumped)
    $

    Fix it, those stats are in evsel->stats, not anymore in evsel->priv.

    Reported-by: Thomas-Mich Richter
    Tested-by: Thomas-Mich Richter
    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Hendrik Brueckner
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Fixes: e669e833da8d ("perf evsel: Restore evsel->priv as a tool private area")
    Link: http://lkml.kernel.org/r/20171109150046.GN4333@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Currently if trace_event__register_resolver() fails, we return -errno,
    but we can't be sure that errno isn't zero in this case.

    Signed-off-by: Andrei Vagin
    Reviewed-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Vasily Averin
    Link: http://lkml.kernel.org/r/20171108002246.8924-2-avagin@openvz.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andrei Vagin
     
  • The Intel PMU event aliases have a implicit period= specifier to set the
    default period.

    Unfortunately this breaks overriding these periods with -c or -F,
    because the alias terms look like they are user specified to the
    internal parser, and user specified event qualifiers override the
    command line options.

    Track that they are coming from aliases by adding a "weak" state to the
    term. Any weak terms don't override command line options.

    I only did it for -c/-F for now, I think that's the only case that's
    broken currently.

    Before:

    $ perf record -c 1000 -vv -e uops_issued.any
    ...
    { sample_period, sample_freq } 2000003

    After:

    $ perf record -c 1000 -vv -e uops_issued.any
    ...
    { sample_period, sample_freq } 1000

    Signed-off-by: Andi Kleen
    Acked-by: Jiri Olsa
    Link: http://lkml.kernel.org/r/20171020202755.21410-2-andi@firstfloor.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andi Kleen
     
  • When we use an initial delay, e.g.: 'perf record --delay 1000', we do not
    enable the events until that delay has passed after we started the workload,
    including the tracking event, i.e. the one for which we have attr.mmap, etc,
    enabled to ask the kernel to generate the PERF_RECORD_{MMAP,COMM,EXEC} metadata
    events that will then allow us to resolve addresses in samples to the map, dso
    and symbol. There will be a shadow that even synthesizing samples won't cover,
    i.e. the workload that we start and other processes forking while we
    wait for the initial delay to expire.

    So use a dummy event to be the tracking one and make it be enabled on exec.

    Before:

    # perf record --delay 1000 stress --cpu 1 --timeout 5
    stress: info: [9029] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
    stress: info: [9029] successful run completed in 5s
    [ perf record: Woken up 3 times to write data ]
    [ perf record: Captured and wrote 0.624 MB perf.data (15908 samples) ]
    # perf script | head
    :9031 9031 32001.826888: 1 cycles:ppp: ffffffff831aa30d event_function (/lib/modules/4.14.0-rc6+/build/vmlinux)
    :9031 9031 32001.826893: 1 cycles:ppp: ffffffff8300d1a0 intel_bts_enable_local (/lib/modules/4.14.0-rc6+/build/vmlinux)
    :9031 9031 32001.826895: 7 cycles:ppp: ffffffff83023870 sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
    :9031 9031 32001.826897: 103 cycles:ppp: ffffffff8300c331 intel_pmu_handle_irq (/lib/modules/4.14.0-rc6+/build/vmlinux)
    :9031 9031 32001.826899: 1615 cycles:ppp: ffffffff830231f8 native_sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
    :9031 9031 32001.826902: 26724 cycles:ppp: ffffffff8384c6a7 native_irq_return_iret (/lib/modules/4.14.0-rc6+/build/vmlinux)
    :9031 9031 32001.826913: 329739 cycles:ppp: 7fb2a5410932 [unknown] ([unknown])
    :9031 9031 32001.827033: 1225451 cycles:ppp: 7fb2a5410930 [unknown] ([unknown])
    :9031 9031 32001.827474: 1391725 cycles:ppp: 7fb2a5410930 [unknown] ([unknown])
    :9031 9031 32001.827978: 1233697 cycles:ppp: 7fb2a5410928 [unknown] ([unknown])
    #

    After:

    # perf record --delay 1000 stress --cpu 1 --timeout 5
    stress: info: [9741] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
    stress: info: [9741] successful run completed in 5s
    [ perf record: Woken up 3 times to write data ]
    [ perf record: Captured and wrote 0.751 MB perf.data (15976 samples) ]
    # perf script | head
    stress 9742 32110.959106: 1 cycles:ppp: ffffffff831b26f6 __perf_event_task_sched_in (/lib/modules/4.14.0-rc6+/build/vmlinux)
    stress 9742 32110.959110: 1 cycles:ppp: ffffffff8300c2e9 intel_pmu_handle_irq (/lib/modules/4.14.0-rc6+/build/vmlinux)
    stress 9742 32110.959112: 7 cycles:ppp: ffffffff830231e0 native_sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
    stress 9742 32110.959115: 101 cycles:ppp: ffffffff83023870 sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
    stress 9742 32110.959117: 1533 cycles:ppp: ffffffff830231f8 native_sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
    stress 9742 32110.959119: 23992 cycles:ppp: ffffffff831b0900 ctx_sched_in (/lib/modules/4.14.0-rc6+/build/vmlinux)
    stress 9742 32110.959129: 329406 cycles:ppp: 7f4b1b661930 __random_r (/usr/lib64/libc-2.25.so)
    stress 9742 32110.959249: 1288322 cycles:ppp: 5566e1e7cbc9 hogcpu (/usr/bin/stress)
    stress 9742 32110.959712: 1464046 cycles:ppp: 7f4b1b66179e __random (/usr/lib64/libc-2.25.so)
    stress 9742 32110.960241: 1266918 cycles:ppp: 7f4b1b66195b __random_r (/usr/lib64/libc-2.25.so)
    #

    Reported-by: Bram Stolk
    Tested-by: Bram Stolk
    Cc: Adrian Hunter
    Cc: Andi Kleen
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Fixes: 6619a53ef757 ("perf record: Add --initial-delay option")
    Link: http://lkml.kernel.org/n/tip-nrdfchshqxf7diszhxcecqb9@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • The evsel->idx field is used mainly to access the right bucket in
    per-event arrays such as the annotation ones, but also to set
    evsel->tracking, that in turn will decide what of the events will ask
    for PERF_RECORD_{MMAP,COMM,EXEC} to be generated, i.e. which
    perf_event_attr will have its mmap, etc fields set.

    When we were adding the "dummy" event using perf_evlist__add_dummy() we
    were not setting it correctly, which could result in multiple tracking
    events.

    Now that I'll try using a dummy event to be the tracking one when using
    'perf record --delay', i.e. when we process the --delay
    setting we may already have the evlist set up, like with:

    perf record -e cycles,instructions --delay 1000 ./workload

    We will need to add a "dummy" event, then reset evsel->tracking for the
    first event, "cycles", and set it instead to the dummy one, and also
    setting its attr.enable_on_exec, so that we get the PERF_RECORD_MMAP,
    etc metadata events while waiting to enable the explicitely requested
    events, so lets get this straight and set the right evsel->idx.

    Cc: Adrian Hunter
    Cc: Bram Stolk
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-nrdfchshqxf7diszhxcecqb9@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     

18 Nov, 2017

1 commit

  • Pull second round of s390 updates from Martin Schwidefsky:

    - rework of the vdso code to avoid the use of the access register mode

    - use perf AUX buffers for the transport of diagnostic sample data

    - add perf_regs and user stack dump support

    - enable perf call graphs for user space programs

    - add perf register support for floating-point registers

    - all remaining s390 related timer_setup conversions

    - bug fixes and cleanups

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (30 commits)
    s390: remove unused parameter from Makefile
    zfcp: purely mechanical update using timer API, plus blank lines
    s390/scsi: Convert timers to use timer_setup()
    s390/cpum_sf: correctly set the PID and TID in perf samples
    s390/cpum_sf: load program parameter at sampler enablement
    s390/perf: add perf register support for floating-point registers
    s390/perf: extend perf_regs support to include floating-point registers
    s390/perf: define common DWARF register string table
    s390/perf: add support for perf_regs and libdw
    s390/perf: add perf_regs support and user stack dump
    s390/cpum_sf: do not register PMU if no sampling mode is authorized
    s390/cpumf: remove raw event support in basic-only sampling mode
    s390/perf: add callback to perf to enable using AUX buffer
    s390/cpumf: enable using AUX buffer
    s390/cpumf: introduce AUX buffer for dump diagnostic sample data
    s390/disassembler: increase show_code buffer size
    s390: Remove CONFIG_HARDENED_USERCOPY
    s390: enable CPU alternatives unconditionally
    s390/nmi: remove unused code
    s390/mm: remove unused code
    ...

    Linus Torvalds
     

16 Nov, 2017

6 commits

  • For correct unwinding of user space processes, the floating-point
    register contents are required. For example, leaf functions might
    use fp registers to temporarily store the return address.

    Signed-off-by: Hendrik Brueckner
    Reviewed-and-tested-by: Thomas Richter
    Signed-off-by: Martin Schwidefsky

    Hendrik Brueckner
     
  • Instead of defining DWARF register to string table in dwarf-regs-table.h
    and dwarf-regs.c, use a common table in dwarf-regs-table.h.

    Ensure that the DWARF register table is up-to-date with
    http://refspecs.linuxfoundation.org/ELF/zSeries/lzsabi0_s390/x1542.html.

    For unwinding with libdw, also ensure to correctly setup the DWARF
    register frame according to the register mappings. Currently, libdw
    supports up to 32 registers only.

    Suggested-by: Thomas Richter
    Signed-off-by: Hendrik Brueckner
    Reviewed-and-tested-by: Thomas Richter
    Signed-off-by: Martin Schwidefsky

    Hendrik Brueckner
     
  • With support for perf_regs and libdw, you can record and report
    call graphs for user space programs. Simply invoke perf with
    the --call-graph=dwarf command line option.

    Signed-off-by: Heiko Carstens
    [brueckner: added dwfl_thread_state_register_pc() call]
    Signed-off-by: Hendrik Brueckner
    Reviewed-and-tested-by: Thomas Richter
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Perf tool need implement a callback to enable using AUX buffer. Perf
    will do another mmap() to trigger the setup of AUX buffer in kernel
    if there is such callback. The default size of the AUX buffer is set
    properly according to the sampling frequency to avoid overflow. It
    could also be manually set by -m option of perf.

    The interface of perf is not changed. Diagnostic mode sampling
    could be started by `perf record -e rBD000` like before.

    Signed-off-by: Pu Hou
    Reviewed-by: Hendrik Brueckner
    Signed-off-by: Martin Schwidefsky

    Pu Hou
     
  • As the page free path makes no distinction between cache hot and cold
    pages, there is no real useful ordering of pages in the free list that
    allocation requests can take advantage of. Juding from the users of
    __GFP_COLD, it is likely that a number of them are the result of copying
    other sites instead of actually measuring the impact. Remove the
    __GFP_COLD parameter which simplifies a number of paths in the page
    allocator.

    This is potentially controversial but bear in mind that the size of the
    per-cpu pagelists versus modern cache sizes means that the whole per-cpu
    list can often fit in the L3 cache. Hence, there is only a potential
    benefit for microbenchmarks that alloc/free pages in a tight loop. It's
    even worse when THP is taken into account which has little or no chance
    of getting a cache-hot page as the per-cpu list is bypassed and the
    zeroing of multiple pages will thrash the cache anyway.

    The truncate microbenchmarks are not shown as this patch affects the
    allocation path and not the free path. A page fault microbenchmark was
    tested but it showed no sigificant difference which is not surprising
    given that the __GFP_COLD branches are a miniscule percentage of the
    fault path.

    Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Cc: Andi Kleen
    Cc: Dave Chinner
    Cc: Dave Hansen
    Cc: Jan Kara
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Now that kmemcheck is gone, we don't need the NOTRACK flags.

    Link: http://lkml.kernel.org/r/20171007030159.22241-5-alexander.levin@verizon.com
    Signed-off-by: Sasha Levin
    Cc: Alexander Potapenko
    Cc: Eric W. Biederman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Steven Rostedt
    Cc: Tim Hansen
    Cc: Vegard Nossum
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Levin, Alexander (Sasha Levin)
     

14 Nov, 2017

2 commits

  • Pull perf updates from Ingo Molnar:
    "The main changes in this cycle were:

    Kernel:

    - kprobes updates: use better W^X patterns for code modifications,
    improve optprobes, remove jprobes. (Masami Hiramatsu, Kees Cook)

    - core fixes: event timekeeping (enabled/running times statistics)
    fixes, perf_event_read() locking fixes and cleanups, etc. (Peter
    Zijlstra)

    - Extend x86 Intel free-running PEBS support and support x86
    user-register sampling in perf record and perf script. (Andi Kleen)

    Tooling:

    - Completely rework the way inline frames are handled. Instead of
    querying for the inline nodes on-demand in the individual tools, we
    now create proper callchain nodes for inlined frames. (Milian
    Wolff)

    - 'perf trace' updates (Arnaldo Carvalho de Melo)

    - Implement a way to print formatted output to per-event files in
    'perf script' to facilitate generate flamegraphs, elliminating the
    need to write scripts to do that separation (yuzhoujian, Arnaldo
    Carvalho de Melo)

    - Update vendor events JSON metrics for Intel's Broadwell, Broadwell
    Server, Haswell, Haswell Server, IvyBridge, IvyTown, JakeTown,
    Sandy Bridge, Skylake, SkyLake Server - and Goldmont Plus V1 (Andi
    Kleen, Kan Liang)

    - Multithread the synthesizing of PERF_RECORD_ events for
    pre-existing threads in 'perf top', speeding up that phase, greatly
    improving the user experience in systems such as Intel's Knights
    Mill (Kan Liang)

    - Introduce the concept of weak groups in 'perf stat': try to set up
    a group, but if it's not schedulable fallback to not using a group.
    That gives us the best of both worlds: groups if they work, but
    still a usable fallback if they don't. E.g: (Andi Kleen)

    - perf sched timehist enhancements (David Ahern)

    - ... various other enhancements, updates, cleanups and fixes"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (139 commits)
    kprobes: Don't spam the build log with deprecation warnings
    arm/kprobes: Remove jprobe test case
    arm/kprobes: Fix kretprobe test to check correct counter
    perf srcline: Show correct function name for srcline of callchains
    perf srcline: Fix memory leak in addr2inlines()
    perf trace beauty kcmp: Beautify arguments
    perf trace beauty: Implement pid_fd beautifier
    tools include uapi: Grab a copy of linux/kcmp.h
    perf callchain: Fix double mapping al->addr for children without self period
    perf stat: Make --per-thread update shadow stats to show metrics
    perf stat: Move the shadow stats scale computation in perf_stat__update_shadow_stats
    perf tools: Add perf_data_file__write function
    perf tools: Add struct perf_data_file
    perf tools: Rename struct perf_data_file to perf_data
    perf script: Print information about per-event-dump files
    perf trace beauty prctl: Generate 'option' string table from kernel headers
    tools include uapi: Grab a copy of linux/prctl.h
    perf script: Allow creating per-event dump files
    perf evsel: Restore evsel->priv as a tool private area
    perf script: Use event_format__fprintf()
    ...

    Linus Torvalds
     
  • Pull core locking updates from Ingo Molnar:
    "The main changes in this cycle are:

    - Another attempt at enabling cross-release lockdep dependency
    tracking (automatically part of CONFIG_PROVE_LOCKING=y), this time
    with better performance and fewer false positives. (Byungchul Park)

    - Introduce lockdep_assert_irqs_enabled()/disabled() and convert
    open-coded equivalents to lockdep variants. (Frederic Weisbecker)

    - Add down_read_killable() and use it in the VFS's iterate_dir()
    method. (Kirill Tkhai)

    - Convert remaining uses of ACCESS_ONCE() to
    READ_ONCE()/WRITE_ONCE(). Most of the conversion was Coccinelle
    driven. (Mark Rutland, Paul E. McKenney)

    - Get rid of lockless_dereference(), by strengthening Alpha atomics,
    strengthening READ_ONCE() with smp_read_barrier_depends() and thus
    being able to convert users of lockless_dereference() to
    READ_ONCE(). (Will Deacon)

    - Various micro-optimizations:

    - better PV qspinlocks (Waiman Long),
    - better x86 barriers (Michael S. Tsirkin)
    - better x86 refcounts (Kees Cook)

    - ... plus other fixes and enhancements. (Borislav Petkov, Juergen
    Gross, Miguel Bernal Marin)"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
    locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE
    rcu: Use lockdep to assert IRQs are disabled/enabled
    netpoll: Use lockdep to assert IRQs are disabled/enabled
    timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabled
    sched/clock, sched/cputime: Use lockdep to assert IRQs are disabled/enabled
    irq_work: Use lockdep to assert IRQs are disabled/enabled
    irq/timings: Use lockdep to assert IRQs are disabled/enabled
    perf/core: Use lockdep to assert IRQs are disabled/enabled
    x86: Use lockdep to assert IRQs are disabled/enabled
    smp/core: Use lockdep to assert IRQs are disabled/enabled
    timers/hrtimer: Use lockdep to assert IRQs are disabled/enabled
    timers/nohz: Use lockdep to assert IRQs are disabled/enabled
    workqueue: Use lockdep to assert IRQs are disabled/enabled
    irq/softirqs: Use lockdep to assert IRQs are disabled/enabled
    locking/lockdep: Add IRQs disabled/enabled assertion APIs: lockdep_assert_irqs_enabled()/disabled()
    locking/pvqspinlock: Implement hybrid PV queued/unfair locks
    locking/rwlocks: Fix comments
    x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized
    block, locking/lockdep: Assign a lock_class per gendisk used for wait_for_completion()
    workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes
    ...

    Linus Torvalds
     

09 Nov, 2017

3 commits

  • Otherwise 'perf trace' leaves a temporary file /tmp/perf-vdso.so-XXXXXX.

    $ perf trace -o log true
    $ ls -l /tmp/perf-vdso.*
    -rw------- 1 root root 8192 Nov 8 03:08 /tmp/perf-vdso.so-5bCpD0

    Signed-off-by: Andrei Vagin
    Reviewed-by: Jiri Olsa
    Cc: Alexander Shishkin
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Vasily Averin
    Link: http://lkml.kernel.org/r/20171108002246.8924-1-avagin@openvz.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Andrei Vagin
     
  • Looks like I've reached the new level of stupidity, adding missing braces.

    Committer testing:

    Given the following eBPF C filter, that will add a record when it
    returns true, i.e. when the tv_nsec variable is > 2000ns, should be
    built and installed via sys_bpf(), but fails to do so before this patch:

    # cat filter.c
    #include
    #define SEC(NAME) __attribute__((section(NAME), used))

    SEC("func=hrtimer_nanosleep rqtp->tv_nsec")
    int func(void *ctx, int err, long nsec)
    {
    return nsec > 1000;
    }
    char _license[] SEC("license") = "GPL";
    int _version SEC("version") = LINUX_VERSION_CODE;
    #

    # perf trace -e nanosleep,filter.c usleep 1
    invalid or unsupported event: 'filter.c'
    Run 'perf list' for a list of valid events

    Usage: perf trace [] []
    or: perf trace [] -- []
    or: perf trace record [] []
    or: perf trace record [] -- []

    -e, --event event/syscall selector. use 'perf list' to list available events
    #

    And works again after it is applied, the nothing is inserted when the co

    # perf trace -e *sleep,filter.c usleep 1
    0.000 ( 0.066 ms): usleep/23994 nanosleep(rqtp: 0x7ffead94a0d0) = 0
    # perf trace -e *sleep,filter.c usleep 2
    0.000 ( 0.008 ms): usleep/24378 nanosleep(rqtp: 0x7fffa021ba50) ...
    0.008 ( ): perf_bpf_probe:func:(ffffffffb410cb30) tv_nsec=2000)
    0.000 ( 0.066 ms): usleep/24378 ... [continued]: nanosleep()) = 0
    #

    The intent of 9445464bb831 is kept:

    # perf stat -e 'cpu/uops_executed.core,krava/' true
    event syntax error: '..cuted.core,krava/'
    \___ unknown term

    valid terms: cmask,pc,event,edge,in_tx,any,ldlat,inv,umask,in_tx_cp,offcore_rsp,config,config1,config2,name,period
    Run 'perf list' for a list of valid events

    Usage: perf stat [] []

    -e, --event event selector. use 'perf list' to list available events
    #
    # perf stat -e 'cpu/uops_executed.core,period=1/' true

    Performance counter stats for 'true':

    808,332 cpu/uops_executed.core,period=1/

    0.002997237 seconds time elapsed

    #

    Reported-by: Arnaldo Carvalho de Melo
    Signed-off-by: Jiri Olsa
    Cc: Andi Kleen
    Cc: Namhyung Kim
    Fixes: 9445464bb831 ("perf tools: Unwind properly location after REJECT")
    Link: http://lkml.kernel.org/n/tip-diea0ihbwpxfw6938huv3whj@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Arnaldo reported broken builds in some distros using a newer flex
    release, 2.6.4, found in Alpine Linux 3.6 and Edge, with flex not
    spotting the REJECT macro:

    CC /tmp/build/perf/util/parse-events-flex.o
    util/parse-events.l: In function 'parse_events_lex':
    /tmp/build/perf/util/parse-events-flex.c:4734:16: error: \
    'reject_used_but_not_detected' undeclared (first use in this function)

    It's happening because we put the REJECT under another USER_REJECT macro
    in following commit:

    9445464bb831 perf tools: Unwind properly location after REJECT

    Fortunately flex provides option for force it to use REJECT, adding it
    to parse-events.l.

    Reported-by: Arnaldo Carvalho de Melo
    Reported-by: Markus Trippelsdorf
    Signed-off-by: Jiri Olsa
    Reviewed-by: Andi Kleen
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Namhyung Kim
    Fixes: 9445464bb831 ("perf tools: Unwind properly location after REJECT")
    Link: http://lkml.kernel.org/n/tip-7kdont984mw12ijk7rji6b8p@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa