28 Jan, 2017

1 commit

  • Pull networking fixes from David Miller:

    1) GTP fixes from Andreas Schultz (missing genl module alias, clear IP
    DF on transmit).

    2) Netfilter needs to reflect the fwmark when sending resets, from Pau
    Espin Pedrol.

    3) nftable dump OOPS fix from Liping Zhang.

    4) Fix erroneous setting of VIRTIO_NET_HDR_F_DATA_VALID on transmit,
    from Rolf Neugebauer.

    5) Fix build error of ipt_CLUSTERIP when procfs is disabled, from Arnd
    Bergmann.

    6) Fix regression in handling of NETIF_F_SG in harmonize_features(),
    from Eric Dumazet.

    7) Fix RTNL deadlock wrt. lwtunnel module loading, from David Ahern.

    8) tcp_fastopen_create_child() needs to setup tp->max_window, from
    Alexey Kodanev.

    9) Missing kmemdup() failure check in ipv6 segment routing code, from
    Eric Dumazet.

    10) Don't execute unix_bind() under the bindlock, otherwise we deadlock
    with splice. From WANG Cong.

    11) ip6_tnl_parse_tlv_enc_lim() potentially reallocates the skb buffer,
    therefore callers must reload cached header pointers into that skb.
    Fix from Eric Dumazet.

    12) Fix various bugs in legacy IRQ fallback handling in alx driver, from
    Tobias Regnery.

    13) Do not allow lwtunnel drivers to be unloaded while they are
    referenced by active instances, from Robert Shearman.

    14) Fix truncated PHY LED trigger names, from Geert Uytterhoeven.

    15) Fix a few regressions from virtio_net XDP support, from John
    Fastabend and Jakub Kicinski.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (102 commits)
    ISDN: eicon: silence misleading array-bounds warning
    net: phy: micrel: add support for KSZ8795
    gtp: fix cross netns recv on gtp socket
    gtp: clear DF bit on GTP packet tx
    gtp: add genl family modules alias
    tcp: don't annotate mark on control socket from tcp_v6_send_response()
    ravb: unmap descriptors when freeing rings
    virtio_net: reject XDP programs using header adjustment
    virtio_net: use dev_kfree_skb for small buffer XDP receive
    r8152: check rx after napi is enabled
    r8152: re-schedule napi for tx
    r8152: avoid start_xmit to schedule napi when napi is disabled
    r8152: avoid start_xmit to call napi_schedule during autosuspend
    net: dsa: Bring back device detaching in dsa_slave_suspend()
    net: phy: leds: Fix truncated LED trigger names
    net: phy: leds: Break dependency of phy.h on phy_led_triggers.h
    net: phy: leds: Clear phy_num_led_triggers on failure to avoid crash
    net-next: ethernet: mediatek: change the compatible string
    Documentation: devicetree: change the mediatek ethernet compatible string
    bnxt_en: Fix RTNL lock usage on bnxt_get_port_module_status().
    ...

    Linus Torvalds
     

23 Jan, 2017

1 commit

  • Pull virtio/vhost fixes from Michael Tsirkin:
    "Random fixes and cleanups that accumulated over the time"

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    virtio/s390: virtio: constify virtio_config_ops structures
    virtio/s390: add missing \n to end of dev_err message
    virtio/s390: support READ_STATUS command for virtio-ccw
    tools/virtio/ringtest: tweaks for s390
    tools/virtio/ringtest: fix run-on-all.sh for offline cpus
    virtio_console: fix a crash in config_work_handler
    vhost/scsi: silence uninitialized variable warning
    vhost: scsi: constify target_core_fabric_ops structures

    Linus Torvalds
     

22 Jan, 2017

1 commit

  • Pull powerpc fixes from Michael Ellerman:
    "Two fixes for fallout from the hugetlb changes we merged this cycle.

    Ten other fixes, four only affect Power9, and the rest are a bit of a
    mixture though nothing terrible.

    Thanks to: Aneesh Kumar K.V, Anton Blanchard, Benjamin Herrenschmidt,
    Dave Martin, Gavin Shan, Madhavan Srinivasan, Nicholas Piggin, Reza
    Arbab"

    * tag 'powerpc-4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    powerpc: Ignore reserved field in DCSR and PVR reads and writes
    powerpc/ptrace: Preserve previous TM fprs/vsrs on short regset write
    powerpc/ptrace: Preserve previous fprs/vsrs on short regset write
    powerpc/perf: Use MSR to report privilege level on P9 DD1
    selftest/powerpc: Wrong PMC initialized in pmc56_overflow test
    powerpc/eeh: Enable IO path on permanent error
    powerpc/perf: Fix PM_BRU_CMPL event code for power9
    powerpc/mm: Fix little-endian 4K hugetlb
    powerpc/mm/hugetlb: Don't panic when we don't find the default huge page size
    powerpc: Fix pgtable pmd cache init
    powerpc/icp-opal: Fix missing KVM case and harden replay
    powerpc/mm: Fix memory hotplug BUG() on radix

    Linus Torvalds
     

20 Jan, 2017

2 commits

  • Make ringtest work on s390 too.

    Signed-off-by: Halil Pasic
    Acked-by: Sascha Silbe
    Signed-off-by: Cornelia Huck

    Halil Pasic
     
  • Since ef1b144d ("tools/virtio/ringtest: fix run-on-all.sh to work
    without /dev/cpu") run-on-all.sh uses seq 0 $HOST_AFFINITY as the list
    of ids of the CPUs to run the command on (assuming ids of online CPUs
    are consecutive and start from 0), where $HOST_AFFINITY is the highest
    CPU id in the system previously determined using lscpu. This can fail
    on systems with offline CPUs.

    Instead let's use lscpu to determine the list of online CPUs.

    Signed-off-by: Halil Pasic
    Fixes: ef1b144d ("tools/virtio/ringtest: fix run-on-all.sh to work without
    /dev/cpu")
    Reviewed-by: Sascha Silbe
    Signed-off-by: Cornelia Huck

    Halil Pasic
     

18 Jan, 2017

3 commits

  • Test uses PMC2 to count the event. But PMC1 is being initialized.
    Patch to fix it.

    Fixes: 3752e453f6ba ('selftests/powerpc: Add tests of PMU EBBs')
    Signed-off-by: Madhavan Srinivasan
    Signed-off-by: Michael Ellerman

    Madhavan Srinivasan
     
  • test_lru_sanity5() fails when the number of online cpus
    is fewer than the number of possible cpus. It can be
    reproduced with qemu by using cmd args "--smp cpus=2,maxcpus=8".

    The problem is the loop in test_lru_sanity5() is testing
    'i' which is incorrect.

    This patch:
    1. Make sched_next_online() always return -1 if it cannot
    find a next cpu to schedule the process.
    2. In test_lru_sanity5(), the parent process does
    sched_setaffinity() first (through sched_next_online())
    and the forked process will inherit it according to
    the 'man sched_setaffinity'.

    Fixes: 5db58faf989f ("bpf: Add tests for the LRU bpf_htab")
    Reported-by: Daniel Borkmann
    Signed-off-by: Martin KaFai Lau
    Acked-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Martin KaFai Lau
     
  • …linux/kernel/git/acme/linux into perf/urgent

    Pull 'perf probe' fixes from Arnaldo Carvalho de Melo <acme@redhat.com>

    - Show correct locations for 'perf probe' on modules (Masami Hiramatsu)

    - Correctly handle 'perf probe's on GCC generated functions in modules (Masami Hiramatsu)

    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

17 Jan, 2017

3 commits

  • Fix to probe on gcc generated functions on modules. Since
    probing on a module is based on its symbol name, it should
    be adjusted on actual symbols.

    E.g. without this fix, perf probe shows probe definition
    on non-exist symbol as below.

    $ perf probe -m build-x86_64/net/netfilter/nf_nat.ko -F in_range*
    in_range.isra.12
    $ perf probe -m build-x86_64/net/netfilter/nf_nat.ko -D in_range
    p:probe/in_range nf_nat:in_range+0

    With this fix, perf probe correctly shows a probe on
    gcc-generated symbol.

    $ perf probe -m build-x86_64/net/netfilter/nf_nat.ko -D in_range
    p:probe/in_range nf_nat:in_range.isra.12+0

    This also fixes same problem on online module as below.

    $ perf probe -m i915 -D assert_plane
    p:probe/assert_plane i915:assert_plane.constprop.134+0

    Signed-off-by: Masami Hiramatsu
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/148411450673.9978.14905987549651656075.stgit@devbox
    Signed-off-by: Arnaldo Carvalho de Melo

    Masami Hiramatsu
     
  • Add error check codes on post processing and improve it for offline
    probe events as:

    - post processing fails if no matched symbol found in map(-ENOENT)
    or strdup() failed(-ENOMEM).

    - Even if the symbol name is the same, it updates symbol address
    and offset.

    Signed-off-by: Masami Hiramatsu
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/148411443738.9978.4617979132625405545.stgit@devbox
    Signed-off-by: Arnaldo Carvalho de Melo

    Masami Hiramatsu
     
  • Fix to show correct locations for events on modules by relocating given
    address instead of retrying after failure.

    This happens when the module text size is big enough, bigger than
    sh_addr, because the original code retries with given address + sh_addr
    if it failed to find CU DIE at the given address.

    Any address smaller than sh_addr always fails and it retries with the
    correct address, but addresses bigger than sh_addr will get a CU DIE
    which is on the given address (not adjusted by sh_addr).

    In my environment(x86-64), the sh_addr of ".text" section is 0x10030.
    Since i915 is a huge kernel module, we can see this issue as below.

    $ grep "[Tt] .*\[i915\]" /proc/kallsyms | sort | head -n1
    ffffffffc0270000 t i915_switcheroo_can_switch [i915]

    ffffffffc0270000 + 0x10030 = ffffffffc0280030, so we'll check
    symbols cross this boundary.

    $ grep "[Tt] .*\[i915\]" /proc/kallsyms | grep -B1 ^ffffffffc028\
    | head -n 2
    ffffffffc027ff80 t haswell_init_clock_gating [i915]
    ffffffffc0280110 t valleyview_init_clock_gating [i915]

    So setup probes on both function and see what happen.

    $ sudo ./perf probe -m i915 -a haswell_init_clock_gating \
    -a valleyview_init_clock_gating
    Added new events:
    probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915)
    probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915)

    You can now use it in all perf tools, such as:

    perf record -e probe:valleyview_init_clock_gating -aR sleep 1

    $ sudo ./perf probe -l
    probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
    probe:valleyview_init_clock_gating (on i915_vga_set_decode:4@gpu/drm/i915/i915_drv.c in i915)

    As you can see, haswell_init_clock_gating is correctly shown,
    but valleyview_init_clock_gating is not.

    With this patch, both events are shown correctly.

    $ sudo ./perf probe -l
    probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
    probe:valleyview_init_clock_gating (on valleyview_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)

    Committer notes:

    In my case:

    # perf probe -m i915 -a haswell_init_clock_gating -a valleyview_init_clock_gating
    Added new events:
    probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915)
    probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915)

    You can now use it in all perf tools, such as:

    perf record -e probe:valleyview_init_clock_gating -aR sleep 1

    # perf probe -l
    probe:haswell_init_clock_gating (on i915_getparam+432@gpu/drm/i915/i915_drv.c in i915)
    probe:valleyview_init_clock_gating (on __i915_printk+240@gpu/drm/i915/i915_drv.c in i915)
    #

    # readelf -SW /lib/modules/4.9.0+/build/vmlinux | egrep -w '.text|Name'
    [Nr] Name Type Address Off Size ES Flg Lk Inf Al
    [ 1] .text PROGBITS ffffffff81000000 200000 822fd3 00 AX 0 0 4096
    #

    So both are b0rked, now with the fix:

    # perf probe -m i915 -a haswell_init_clock_gating -a valleyview_init_clock_gating
    Added new events:
    probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915)
    probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915)

    You can now use it in all perf tools, such as:

    perf record -e probe:valleyview_init_clock_gating -aR sleep 1

    # perf probe -l
    probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
    probe:valleyview_init_clock_gating (on valleyview_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
    #

    Both looks correct.

    Signed-off-by: Masami Hiramatsu
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/148411436777.9978.1440275861947194930.stgit@devbox
    Signed-off-by: Arnaldo Carvalho de Melo

    Masami Hiramatsu
     

16 Jan, 2017

1 commit

  • Pull perf fixes from Ingo Molnar:
    "Misc race fixes uncovered by fuzzing efforts, a Sparse fix, two PMU
    driver fixes, plus miscellanous tooling fixes"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/x86: Reject non sampling events with precise_ip
    perf/x86/intel: Account interrupts for PEBS errors
    perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
    perf/core: Fix sys_perf_event_open() vs. hotplug
    perf/x86/intel: Use ULL constant to prevent undefined shift behaviour
    perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code
    perf/x86: Set pmu->module in Intel PMU modules
    perf probe: Fix to probe on gcc generated symbols for offline kernel
    perf probe: Fix --funcs to show correct symbols for offline module
    perf symbols: Robustify reading of build-id from sysfs
    perf tools: Install tools/lib/traceevent plugins with install-bin
    tools lib traceevent: Fix prev/next_prio for deadline tasks
    perf record: Fix --switch-output documentation and comment
    perf record: Make __record_options static
    tools lib subcmd: Add OPT_STRING_OPTARG_SET option
    perf probe: Fix to get correct modname from elf header
    samples/bpf trace_output_user: Remove duplicate sys/ioctl.h include
    samples/bpf sock_example: Avoid getting ethhdr from two includes
    perf sched timehist: Show total scheduling time

    Linus Torvalds
     

12 Jan, 2017

1 commit

  • Merge fixes from Andrew Morton:
    "27 fixes.

    There are three patches that aren't actually fixes. They're simple
    function renamings which are nice-to-have in mainline as ongoing net
    development depends on them."

    * akpm: (27 commits)
    timerfd: export defines to userspace
    mm/hugetlb.c: fix reservation race when freeing surplus pages
    mm/slab.c: fix SLAB freelist randomization duplicate entries
    zram: support BDI_CAP_STABLE_WRITES
    zram: revalidate disk under init_lock
    mm: support anonymous stable page
    mm: add documentation for page fragment APIs
    mm: rename __page_frag functions to __page_frag_cache, drop order from drain
    mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
    mm, memcg: fix the active list aging for lowmem requests when memcg is enabled
    mm: don't dereference struct page fields of invalid pages
    mailmap: add codeaurora.org names for nameless email commits
    signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
    mm: pmd dirty emulation in page fault handler
    ipc/sem.c: fix incorrect sem_lock pairing
    lib/Kconfig.debug: fix frv build failure
    mm: get rid of __GFP_OTHER_NODE
    mm: fix remote numa hits statistics
    mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}
    ocfs2: fix crash caused by stale lvb with fsdlm plugin
    ...

    Linus Torvalds
     

11 Jan, 2017

1 commit

  • The flag was introduced by commit 78afd5612deb ("mm: add
    __GFP_OTHER_NODE flag") to allow proper accounting of remote node
    allocations done by kernel daemons on behalf of a process - e.g.
    khugepaged.

    After "mm: fix remote numa hits statistics" we do not need and actually
    use the flag so we can safely remove it because all allocations which
    are satisfied from their "home" node are accounted properly.

    [mhocko@suse.com: fix build]
    Link: http://lkml.kernel.org/r/20170106122225.GK5556@dhcp22.suse.cz
    Link: http://lkml.kernel.org/r/20170102153057.9451-3-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Cc: Michal Hocko
    Cc: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Taku Izumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     

06 Jan, 2017

4 commits


05 Jan, 2017

1 commit

  • …linux/kernel/git/acme/linux into perf/urgent

    Pull perf/urgent fixes and one improvement from Arnaldo Carvalho de Melo:

    Fixes:

    - Fix prev/next_prio formatting for deadline tasks in libtraceevent (Daniel Bristot de Oliveira)

    - Robustify reading of build-ids from /sys/kernel/note (Arnaldo Carvalho de Melo)

    - Fix building some sample/bpf in Alpine Linux 3.4 (Arnaldo Carvalho de Melo)

    - Fix 'make install-bin' to install libtraceevent plugins (Arnaldo Carvalho de Melo)

    - Fix 'perf record --switch-output' documentation and comment (Jiri Olsa)

    - Fix 'perf probe' for cross arch probing (Masami Hiramatsu)

    Improvement:

    - Show total scheduling time in 'perf sched timehist' (Namhyumg Kim)

    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

    Ingo Molnar
     

04 Jan, 2017

5 commits

  • Fix perf-probe to show probe definition on gcc generated symbols for
    offline kernel (including cross-arch kernel image).

    gcc sometimes optimizes functions and generate new symbols with suffixes
    such as ".constprop.N" or ".isra.N" etc. Since those symbol names are
    not recorded in DWARF, we have to find correct generated symbols from
    offline ELF binary to probe on it (kallsyms doesn't correct it). For
    online kernel or uprobes we don't need it because those are rebased on
    _text, or a section relative address.

    E.g. Without this:

    $ perf probe -k build-arm/vmlinux -F __slab_alloc*
    __slab_alloc.constprop.9
    $ perf probe -k build-arm/vmlinux -D __slab_alloc
    p:probe/__slab_alloc __slab_alloc+0

    If you put above definition on target machine, it should fail
    because there is no __slab_alloc in kallsyms.

    With this fix, perf probe shows correct probe definition on
    __slab_alloc.constprop.9:

    $ perf probe -k build-arm/vmlinux -D __slab_alloc
    p:probe/__slab_alloc __slab_alloc.constprop.9+0

    Signed-off-by: Masami Hiramatsu
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/148350060434.19001.11864836288580083501.stgit@devbox
    Signed-off-by: Arnaldo Carvalho de Melo

    Masami Hiramatsu
     
  • Fix --funcs (-F) option to show correct symbols for offline module.
    Since previous perf-probe uses machine__findnew_module_map() for offline
    module, even if user passes a module file (with full path) which is for
    other architecture, perf-probe always tries to load symbol map for
    current kernel module.

    This fix uses dso__new_map() to load the map from given binary as same
    as a map for user applications.

    Signed-off-by: Masami Hiramatsu
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/148350053478.19001.15435255244512631545.stgit@devbox
    Signed-off-by: Arnaldo Carvalho de Melo

    Masami Hiramatsu
     
  • Markus reported that perf segfaults when reading /sys/kernel/notes from
    a kernel linked with GNU gold, due to what looks like a gold bug, so do
    some bounds checking to avoid crashing in that case.

    Reported-by: Markus Trippelsdorf
    Report-Link: http://lkml.kernel.org/r/20161219161821.GA294@x4
    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-ryhgs6a6jxvz207j2636w31c@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Those are binaries as well, so should be installed by:

    make -C tools/perf install-bin'

    too.

    Cc: Alexander Shishkin
    Cc: Daniel Bristot de Oliveira
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/n/tip-3841b37u05evxrs1igkyu6ks@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • Currently, the sched:sched_switch tracepoint reports deadline tasks with
    priority -1. But when reading the trace via perf script I've got the
    following output:

    # ./d & # (d is a deadline task, see [1])
    # perf record -e sched:sched_switch -a sleep 1
    # perf script
    ...
    swapper 0 [000] 2146.962441: sched:sched_switch: swapper/0:0 [120] R ==> d:2593 [4294967295]
    d 2593 [000] 2146.972472: sched:sched_switch: d:2593 [4294967295] R ==> g:2590 [4294967295]

    The task d reports the wrong priority [4294967295]. This happens because
    the "int prio" is stored in an unsigned long long val. Although it is
    set as a %lld, as int is shorter than unsigned long long,
    trace_seq_printf prints it as a positive number.

    The fix is just to cast the val as an int, and print it as a %d,
    as in the sched:sched_switch tracepoint's "format".

    The output with the fix is:

    # ./d &
    # perf record -e sched:sched_switch -a sleep 1
    # perf script
    ...
    swapper 0 [000] 4306.374037: sched:sched_switch: swapper/0:0 [120] R ==> d:10941 [-1]
    d 10941 [000] 4306.383823: sched:sched_switch: d:10941 [-1] R ==> swapper/0:0 [120]

    [1] d.c

    ---
    #include
    #include
    #include
    #include
    #include

    struct sched_attr {
    __u32 size, sched_policy;
    __u64 sched_flags;
    __s32 sched_nice;
    __u32 sched_priority;
    __u64 sched_runtime, sched_deadline, sched_period;
    };

    int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags)
    {
    return syscall(__NR_sched_setattr, pid, attr, flags);
    }

    int main(void)
    {
    struct sched_attr attr = {
    .size = sizeof(attr),
    .sched_policy = SCHED_DEADLINE, /* This creates a 10ms/30ms reservation */
    .sched_runtime = 10 * 1000 * 1000,
    .sched_period = attr.sched_deadline = 30 * 1000 * 1000,
    };

    if (sched_setattr(0, &attr, 0) < 0) {
    perror("sched_setattr");
    return -1;
    }

    for(;;);
    }
    ---

    Committer notes:

    Got the program from the provided URL, http://bristot.me/lkml/d.c,
    trimmed it and included in the cset log above, so that we have
    everything needed to test it in one place.

    Signed-off-by: Daniel Bristot de Oliveira
    Acked-by: Steven Rostedt
    Tested-by: Arnaldo Carvalho de Melo
    Cc: Alexander Shishkin
    Cc: Daniel Bristot de Oliveira
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/866ef75bcebf670ae91c6a96daa63597ba981f0d.1483443552.git.bristot@redhat.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Daniel Bristot de Oliveira
     

03 Jan, 2017

4 commits

  • There's no --signal-trigger option, also adding the code comment into
    record man page.

    Signed-off-by: Jiri Olsa
    Tested-by: Wang Nan
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1483431600-19887-4-git-send-email-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • There's no need for this one to be global.

    Signed-off-by: Jiri Olsa
    Tested-by: Wang Nan
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1483431600-19887-3-git-send-email-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • To allow string options with a default argument and variable set when
    the option is used.

    Signed-off-by: Jiri Olsa
    Tested-by: Wang Nan
    Cc: David Ahern
    Cc: Josh Poimboeuf
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1483431600-19887-2-git-send-email-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Since 'perf probe' supports cross-arch probes, it is possible to analyze
    different arch kernel image which has different bits-per-long.

    In that case, it fails to get the module name because it uses the
    MOD_NAME_OFFSET macro based on the host machine bits-per-long, instead
    of the target arch bits-per-long.

    This fixes above issue by changing modname-offset based on the target
    archs bit width. This is ok because linux kernel uses LP64 model on
    64bit arch.

    E.g. without this (on x86_64, and target module is arm32):

    $ perf probe -m build-arm/fs/configfs/configfs.ko -D configfs_lookup
    p:probe/configfs_lookup :configfs_lookup+0
    ^-Here is an empty module name.

    With this fix, you can see correct module name:

    $ perf probe -m build-arm/fs/configfs/configfs.ko -D configfs_lookup
    p:probe/configfs_lookup configfs:configfs_lookup+0

    Signed-off-by: Masami Hiramatsu
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/148337043836.6752.383495516397005695.stgit@devbox
    Signed-off-by: Arnaldo Carvalho de Melo

    Masami Hiramatsu
     

28 Dec, 2016

1 commit

  • Show length of analyzed sample time and rate of idle task running.
    This also takes care of time range given by --time option.

    $ perf sched timehist -sI | tail
    Samples do not have callchains.
    Idle stats:
    CPU 0 idle for 930.316 msec ( 92.93%)
    CPU 1 idle for 963.614 msec ( 96.25%)
    CPU 2 idle for 885.482 msec ( 88.45%)
    CPU 3 idle for 938.635 msec ( 93.76%)

    Total number of unique tasks: 118
    Total number of context switches: 2337
    Total run time (msec): 3718.048
    Total scheduling time (msec): 1001.131 (x 4)

    Suggested-by: David Ahern
    Signed-off-by: Namhyung Kim
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20161222060350.17655-3-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

26 Dec, 2016

1 commit

  • Pull turbostat updates from Len Brown.

    * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
    tools/power turbostat: remove obsolete -M, -m, -C, -c options
    tools/power turbostat: Make extensible via the --add parameter
    tools/power turbostat: Denverton uses a 25 MHz crystal, not 19.2 MHz
    tools/power turbostat: line up headers when -M is used
    tools/power turbostat: fix SKX PKG_CSTATE_LIMIT decoding
    tools/power turbostat: Support Knights Mill (KNM)
    tools/power turbostat: Display HWP OOB status
    tools/power turbostat: fix Denverton BCLK
    tools/power turbostat: use intel-family.h model strings
    tools/power/turbostat: Add Denverton RAPL support
    tools/power/turbostat: Add Denverton support
    tools/power/turbostat: split core MSR support into status + limit
    tools/power turbostat: fix error case overflow read of slm_freq_table[]
    tools/power turbostat: Allocate correct amount of fd and irq entries
    tools/power turbostat: switch to tab delimited output
    tools/power turbostat: Gracefully handle ACPI S3
    tools/power turbostat: tidy up output on Joule counter overflow

    Linus Torvalds
     

25 Dec, 2016

2 commits

  • The new --add option has replaced the -M, -m, -C, -c options
    Eg.

    -M 0x10 is now --add msr0x10,raw
    -m 0x10 is now --add msr0x10,raw,u32
    -C 0x10 is now --add msr0x10,delta
    -c 0x10 is now --add msr0x10,delta,u32

    The --add option can be repeated to add any number of counters,
    while the previous options were limited to adding one of each type.

    In addition, the --add option can accept a column label,
    and can also display a counter as a percentage of elapsed cycles.

    Eg. --add msr0x3fe,core,percent,MY_CC3

    Signed-off-by: Len Brown

    Len Brown
     
  • Create the "--add" parameter. This can be used to teach an existing
    turbostat binary about any number of any type of counter.

    turbostat(8) details the syntax for --add.

    Signed-off-by: Len Brown

    Len Brown
     

24 Dec, 2016

1 commit

  • Pull perf fixes from Ingo Molnar:
    "On the kernel side there's two x86 PMU driver fixes and a uprobes fix,
    plus on the tooling side there's a number of fixes and some late
    updates"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
    perf sched timehist: Fix invalid period calculation
    perf sched timehist: Remove hardcoded 'comm_width' check at print_summary
    perf sched timehist: Enlarge default 'comm_width'
    perf sched timehist: Honour 'comm_width' when aligning the headers
    perf/x86: Fix overlap counter scheduling bug
    perf/x86/pebs: Fix handling of PEBS buffer overflows
    samples/bpf: Move open_raw_sock to separate header
    samples/bpf: Remove perf_event_open() declaration
    samples/bpf: Be consistent with bpf_load_program bpf_insn parameter
    tools lib bpf: Add bpf_prog_{attach,detach}
    samples/bpf: Switch over to libbpf
    perf diff: Do not overwrite valid build id
    perf annotate: Don't throw error for zero length symbols
    perf bench futex: Fix lock-pi help string
    perf trace: Check if MAP_32BIT is defined (again)
    samples/bpf: Make perf_event_read() static
    uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation
    samples/bpf: Make samples more libbpf-centric
    tools lib bpf: Add flags to bpf_create_map()
    tools lib bpf: use __u32 from linux/types.h
    ...

    Linus Torvalds
     

23 Dec, 2016

4 commits

  • When --time option is given with a value outside recorded time, the last
    sample time (tprev) was set to that value and run time calculation might
    be incorrect. This is a problem of the first samples for each cpus
    since it would skip the runtime update when tprev is 0. But with --time
    option it had non-zero (which is invalid) value so the calculation is
    also incorrect.

    For example, let's see the followging:

    $ perf sched timehist
    time cpu task name wait time sch delay run time
    [tid/pid] (msec) (msec) (msec)
    --------------- ------ ------------------------------ --------- --------- ---------
    3195.968367 [0003] 0.000 0.000 0.000
    3195.968386 [0002] Timer[4306/4277] 0.000 0.000 0.018
    3195.968397 [0002] Web Content[4277] 0.000 0.000 0.000
    3195.968595 [0001] JS Helper[4302/4277] 0.000 0.000 0.000
    3195.969217 [0000] 0.000 0.000 0.621
    3195.969251 [0001] kworker/1:1H[291] 0.000 0.000 0.033

    The sample starts at 3195.968367 but when I gave a time interval from
    3194 to 3196 (in sec) it will calculate the whole 2 second as runtime.
    In below, 2 cpus accounted it as runtime, other 2 cpus accounted it as
    idle time.

    Before:

    $ perf sched timehist --time 3194,3196 -s | tail
    Idle stats:
    CPU 0 idle for 1995.991 msec
    CPU 1 idle for 20.793 msec
    CPU 2 idle for 30.191 msec
    CPU 3 idle for 1999.852 msec

    Total number of unique tasks: 23
    Total number of context switches: 128
    Total run time (msec): 3724.940

    After:

    $ perf sched timehist --time 3194,3196 -s | tail
    Idle stats:
    CPU 0 idle for 10.811 msec
    CPU 1 idle for 20.793 msec
    CPU 2 idle for 30.191 msec
    CPU 3 idle for 18.337 msec

    Total number of unique tasks: 23
    Total number of context switches: 128
    Total run time (msec): 18.139

    Committer notes:

    Further testing:

    Before:

    Idle stats:
    CPU 0 idle for 229.785 msec
    CPU 1 idle for 937.944 msec
    CPU 2 idle for 188.931 msec
    CPU 3 idle for 986.185 msec

    After:

    # perf sched timehist --time 40602,40603 -s | tail

    Idle stats:
    CPU 0 idle for 229.785 msec
    CPU 1 idle for 175.407 msec
    CPU 2 idle for 188.931 msec
    CPU 3 idle for 223.657 msec

    Total number of unique tasks: 68
    Total number of context switches: 814
    Total run time (msec): 97.688

    # for cpu in `seq 0 3` ; do echo -n "CPU $cpu idle for " ; perf sched timehist --time 40602,40603 | grep "\[000${cpu}\].*\" | tr -s ' ' | cut -d' ' -f7 | awk '{entries++ ; s+=$1} END {print s " msec (entries: " entries ")"}' ; done
    CPU 0 idle for 229.721 msec (entries: 123)
    CPU 1 idle for 175.381 msec (entries: 65)
    CPU 2 idle for 188.903 msec (entries: 56)
    CPU 3 idle for 223.61 msec (entries: 102)

    Difference due to the idle stats being accounted at nanoseconds precision while
    the entries in 'perf sched timehist' are trucated at msec.usec.

    Signed-off-by: Namhyung Kim
    Tested-by: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Fixes: 853b74071110 ("perf sched timehist: Add option to specify time window of interest")
    Link: http://lkml.kernel.org/r/20161222060350.17655-2-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • Now that the default 'comm_width' value is 30, no need to check that at
    print_summary,

    Signed-off-by: Namhyung Kim
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
    [ Split from a larger patch ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • Current default value is 20 but it's easily changed to a bigger value as
    task has a long name and different tid and pid. And it makes the output
    not aligned. So change it to have a large value as summary shows.

    Committer notes:

    Before:

    # perf sched record
    ^C
    # perf sched timehist

    40602.770537 [0001] rcuos/2[29] 7.970 0.002 0.020
    40602.771512 [0003] 0.003 0.000 0.986
    40602.771586 [0001] 0.020 0.000 1.049
    40602.771606 [0001] qemu-system-x86[3593/3510] 0.000 0.002 0.020
    40602.771629 [0003] qemu-system-x86[3510] 0.000 0.003 0.116
    40602.771776 [0000] 0.001 0.000 1.892

    After:

    # perf sched timehist

    40602.770537 [0001] rcuos/2[29] 7.970 0.002 0.020
    40602.771512 [0003] 0.003 0.000 0.986
    40602.771586 [0001] 0.020 0.000 1.049
    40602.771606 [0001] qemu-system-x86[3593/3510] 0.000 0.002 0.020
    40602.771629 [0003] qemu-system-x86[3510] 0.000 0.003 0.116

    Signed-off-by: Namhyung Kim
    Tested-by: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
    [ Split from a larger patch ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • Current default value is 20, but that may change in the future, so make
    places where we have 20 hardcoded use 'comm_width'.

    Signed-off-by: Namhyung Kim
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
    [ Split from a larger patch ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

20 Dec, 2016

3 commits

  • Commit d8c5b17f2bc0 ("samples: bpf: add userspace example for attaching
    eBPF programs to cgroups") added these functions to samples/libbpf, but
    during this merge all of the samples libbpf functionality is shifting to
    tools/lib/bpf. Shift these functions there.

    Committer notes:

    Use bzero + attr.FIELD = value instead of 'attr = { .FIELD = value, just
    like the other wrapper calls to sys_bpf with bpf_attr to make this build
    in older toolchais, such as the ones in CentOS 5 and 6.

    Signed-off-by: Joe Stringer
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-au2zvtsh55vqeo3v3uw7jr4c@git.kernel.org
    Link: https://github.com/joestringer/linux/commit/353e6f298c3d0a92fa8bfa61ff898c5050261a12.patch
    Signed-off-by: Arnaldo Carvalho de Melo

    Joe Stringer
     
  • Fixes a perf diff regression issue which was introduced by commit
    5baecbcd9c9a ("perf symbols: we can now read separate debug-info files
    based on a build ID")

    The binary name could be same when perf diff different binaries. Build
    id is used to distinguish between them.
    However, the previous patch assumes the same binary name has same build
    id. So it overwrites the build id according to the binary name,
    regardless of whether the build id is set or not.

    Check the has_build_id in dso__load. If the build id is already set, use
    it.

    Before the fix:

    $ perf diff 1.perf.data 2.perf.data
    # Event 'cycles'
    #
    # Baseline Delta Shared Object Symbol
    # ........ ....... ................ .............................
    #
    99.83% -99.80% tchain_edit [.] f2
    0.12% +99.81% tchain_edit [.] f3
    0.02% -0.01% [ixgbe] [k] ixgbe_read_reg

    After the fix:
    $ perf diff 1.perf.data 2.perf.data
    # Event 'cycles'
    #
    # Baseline Delta Shared Object Symbol
    # ........ ....... ................ .............................
    #
    99.83% +0.10% tchain_edit [.] f3
    0.12% -0.08% tchain_edit [.] f2

    Signed-off-by: Kan Liang
    Cc: Andi Kleen
    CC: Dima Kogan
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Fixes: 5baecbcd9c9a ("perf symbols: we can now read separate debug-info files based on a build ID")
    Link: http://lkml.kernel.org/r/1481642984-13593-1-git-send-email-kan.liang@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Kan Liang
     
  • 'perf report --tui' exits with error when it finds a sample of zero
    length symbol (i.e. addr == sym->start == sym->end). Actually these are
    valid samples. Don't exit TUI and show report with such symbols.

    Reported-and-Tested-by: Anton Blanchard
    Link: https://lkml.org/lkml/2016/10/8/189
    Signed-off-by: Ravi Bangoria
    Cc: Alexander Shishkin
    Cc: Benjamin Herrenschmidt
    Cc: Chris Riyder
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: Masami Hiramatsu
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: stable@kernel.org # v4.9+
    Link: http://lkml.kernel.org/r/1479804050-5028-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Ravi Bangoria