26 Dec, 2016

1 commit

  • Pull turbostat updates from Len Brown.

    * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
    tools/power turbostat: remove obsolete -M, -m, -C, -c options
    tools/power turbostat: Make extensible via the --add parameter
    tools/power turbostat: Denverton uses a 25 MHz crystal, not 19.2 MHz
    tools/power turbostat: line up headers when -M is used
    tools/power turbostat: fix SKX PKG_CSTATE_LIMIT decoding
    tools/power turbostat: Support Knights Mill (KNM)
    tools/power turbostat: Display HWP OOB status
    tools/power turbostat: fix Denverton BCLK
    tools/power turbostat: use intel-family.h model strings
    tools/power/turbostat: Add Denverton RAPL support
    tools/power/turbostat: Add Denverton support
    tools/power/turbostat: split core MSR support into status + limit
    tools/power turbostat: fix error case overflow read of slm_freq_table[]
    tools/power turbostat: Allocate correct amount of fd and irq entries
    tools/power turbostat: switch to tab delimited output
    tools/power turbostat: Gracefully handle ACPI S3
    tools/power turbostat: tidy up output on Joule counter overflow

    Linus Torvalds
     

25 Dec, 2016

2 commits

  • The new --add option has replaced the -M, -m, -C, -c options
    Eg.

    -M 0x10 is now --add msr0x10,raw
    -m 0x10 is now --add msr0x10,raw,u32
    -C 0x10 is now --add msr0x10,delta
    -c 0x10 is now --add msr0x10,delta,u32

    The --add option can be repeated to add any number of counters,
    while the previous options were limited to adding one of each type.

    In addition, the --add option can accept a column label,
    and can also display a counter as a percentage of elapsed cycles.

    Eg. --add msr0x3fe,core,percent,MY_CC3

    Signed-off-by: Len Brown

    Len Brown
     
  • Create the "--add" parameter. This can be used to teach an existing
    turbostat binary about any number of any type of counter.

    turbostat(8) details the syntax for --add.

    Signed-off-by: Len Brown

    Len Brown
     

24 Dec, 2016

1 commit

  • Pull perf fixes from Ingo Molnar:
    "On the kernel side there's two x86 PMU driver fixes and a uprobes fix,
    plus on the tooling side there's a number of fixes and some late
    updates"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
    perf sched timehist: Fix invalid period calculation
    perf sched timehist: Remove hardcoded 'comm_width' check at print_summary
    perf sched timehist: Enlarge default 'comm_width'
    perf sched timehist: Honour 'comm_width' when aligning the headers
    perf/x86: Fix overlap counter scheduling bug
    perf/x86/pebs: Fix handling of PEBS buffer overflows
    samples/bpf: Move open_raw_sock to separate header
    samples/bpf: Remove perf_event_open() declaration
    samples/bpf: Be consistent with bpf_load_program bpf_insn parameter
    tools lib bpf: Add bpf_prog_{attach,detach}
    samples/bpf: Switch over to libbpf
    perf diff: Do not overwrite valid build id
    perf annotate: Don't throw error for zero length symbols
    perf bench futex: Fix lock-pi help string
    perf trace: Check if MAP_32BIT is defined (again)
    samples/bpf: Make perf_event_read() static
    uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation
    samples/bpf: Make samples more libbpf-centric
    tools lib bpf: Add flags to bpf_create_map()
    tools lib bpf: use __u32 from linux/types.h
    ...

    Linus Torvalds
     

23 Dec, 2016

4 commits

  • When --time option is given with a value outside recorded time, the last
    sample time (tprev) was set to that value and run time calculation might
    be incorrect. This is a problem of the first samples for each cpus
    since it would skip the runtime update when tprev is 0. But with --time
    option it had non-zero (which is invalid) value so the calculation is
    also incorrect.

    For example, let's see the followging:

    $ perf sched timehist
    time cpu task name wait time sch delay run time
    [tid/pid] (msec) (msec) (msec)
    --------------- ------ ------------------------------ --------- --------- ---------
    3195.968367 [0003] 0.000 0.000 0.000
    3195.968386 [0002] Timer[4306/4277] 0.000 0.000 0.018
    3195.968397 [0002] Web Content[4277] 0.000 0.000 0.000
    3195.968595 [0001] JS Helper[4302/4277] 0.000 0.000 0.000
    3195.969217 [0000] 0.000 0.000 0.621
    3195.969251 [0001] kworker/1:1H[291] 0.000 0.000 0.033

    The sample starts at 3195.968367 but when I gave a time interval from
    3194 to 3196 (in sec) it will calculate the whole 2 second as runtime.
    In below, 2 cpus accounted it as runtime, other 2 cpus accounted it as
    idle time.

    Before:

    $ perf sched timehist --time 3194,3196 -s | tail
    Idle stats:
    CPU 0 idle for 1995.991 msec
    CPU 1 idle for 20.793 msec
    CPU 2 idle for 30.191 msec
    CPU 3 idle for 1999.852 msec

    Total number of unique tasks: 23
    Total number of context switches: 128
    Total run time (msec): 3724.940

    After:

    $ perf sched timehist --time 3194,3196 -s | tail
    Idle stats:
    CPU 0 idle for 10.811 msec
    CPU 1 idle for 20.793 msec
    CPU 2 idle for 30.191 msec
    CPU 3 idle for 18.337 msec

    Total number of unique tasks: 23
    Total number of context switches: 128
    Total run time (msec): 18.139

    Committer notes:

    Further testing:

    Before:

    Idle stats:
    CPU 0 idle for 229.785 msec
    CPU 1 idle for 937.944 msec
    CPU 2 idle for 188.931 msec
    CPU 3 idle for 986.185 msec

    After:

    # perf sched timehist --time 40602,40603 -s | tail

    Idle stats:
    CPU 0 idle for 229.785 msec
    CPU 1 idle for 175.407 msec
    CPU 2 idle for 188.931 msec
    CPU 3 idle for 223.657 msec

    Total number of unique tasks: 68
    Total number of context switches: 814
    Total run time (msec): 97.688

    # for cpu in `seq 0 3` ; do echo -n "CPU $cpu idle for " ; perf sched timehist --time 40602,40603 | grep "\[000${cpu}\].*\" | tr -s ' ' | cut -d' ' -f7 | awk '{entries++ ; s+=$1} END {print s " msec (entries: " entries ")"}' ; done
    CPU 0 idle for 229.721 msec (entries: 123)
    CPU 1 idle for 175.381 msec (entries: 65)
    CPU 2 idle for 188.903 msec (entries: 56)
    CPU 3 idle for 223.61 msec (entries: 102)

    Difference due to the idle stats being accounted at nanoseconds precision while
    the entries in 'perf sched timehist' are trucated at msec.usec.

    Signed-off-by: Namhyung Kim
    Tested-by: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Fixes: 853b74071110 ("perf sched timehist: Add option to specify time window of interest")
    Link: http://lkml.kernel.org/r/20161222060350.17655-2-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • Now that the default 'comm_width' value is 30, no need to check that at
    print_summary,

    Signed-off-by: Namhyung Kim
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
    [ Split from a larger patch ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • Current default value is 20 but it's easily changed to a bigger value as
    task has a long name and different tid and pid. And it makes the output
    not aligned. So change it to have a large value as summary shows.

    Committer notes:

    Before:

    # perf sched record
    ^C
    # perf sched timehist

    40602.770537 [0001] rcuos/2[29] 7.970 0.002 0.020
    40602.771512 [0003] 0.003 0.000 0.986
    40602.771586 [0001] 0.020 0.000 1.049
    40602.771606 [0001] qemu-system-x86[3593/3510] 0.000 0.002 0.020
    40602.771629 [0003] qemu-system-x86[3510] 0.000 0.003 0.116
    40602.771776 [0000] 0.001 0.000 1.892

    After:

    # perf sched timehist

    40602.770537 [0001] rcuos/2[29] 7.970 0.002 0.020
    40602.771512 [0003] 0.003 0.000 0.986
    40602.771586 [0001] 0.020 0.000 1.049
    40602.771606 [0001] qemu-system-x86[3593/3510] 0.000 0.002 0.020
    40602.771629 [0003] qemu-system-x86[3510] 0.000 0.003 0.116

    Signed-off-by: Namhyung Kim
    Tested-by: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
    [ Split from a larger patch ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • Current default value is 20, but that may change in the future, so make
    places where we have 20 hardcoded use 'comm_width'.

    Signed-off-by: Namhyung Kim
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
    [ Split from a larger patch ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     

20 Dec, 2016

5 commits

  • Commit d8c5b17f2bc0 ("samples: bpf: add userspace example for attaching
    eBPF programs to cgroups") added these functions to samples/libbpf, but
    during this merge all of the samples libbpf functionality is shifting to
    tools/lib/bpf. Shift these functions there.

    Committer notes:

    Use bzero + attr.FIELD = value instead of 'attr = { .FIELD = value, just
    like the other wrapper calls to sys_bpf with bpf_attr to make this build
    in older toolchais, such as the ones in CentOS 5 and 6.

    Signed-off-by: Joe Stringer
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-au2zvtsh55vqeo3v3uw7jr4c@git.kernel.org
    Link: https://github.com/joestringer/linux/commit/353e6f298c3d0a92fa8bfa61ff898c5050261a12.patch
    Signed-off-by: Arnaldo Carvalho de Melo

    Joe Stringer
     
  • Fixes a perf diff regression issue which was introduced by commit
    5baecbcd9c9a ("perf symbols: we can now read separate debug-info files
    based on a build ID")

    The binary name could be same when perf diff different binaries. Build
    id is used to distinguish between them.
    However, the previous patch assumes the same binary name has same build
    id. So it overwrites the build id according to the binary name,
    regardless of whether the build id is set or not.

    Check the has_build_id in dso__load. If the build id is already set, use
    it.

    Before the fix:

    $ perf diff 1.perf.data 2.perf.data
    # Event 'cycles'
    #
    # Baseline Delta Shared Object Symbol
    # ........ ....... ................ .............................
    #
    99.83% -99.80% tchain_edit [.] f2
    0.12% +99.81% tchain_edit [.] f3
    0.02% -0.01% [ixgbe] [k] ixgbe_read_reg

    After the fix:
    $ perf diff 1.perf.data 2.perf.data
    # Event 'cycles'
    #
    # Baseline Delta Shared Object Symbol
    # ........ ....... ................ .............................
    #
    99.83% +0.10% tchain_edit [.] f3
    0.12% -0.08% tchain_edit [.] f2

    Signed-off-by: Kan Liang
    Cc: Andi Kleen
    CC: Dima Kogan
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Fixes: 5baecbcd9c9a ("perf symbols: we can now read separate debug-info files based on a build ID")
    Link: http://lkml.kernel.org/r/1481642984-13593-1-git-send-email-kan.liang@intel.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Kan Liang
     
  • 'perf report --tui' exits with error when it finds a sample of zero
    length symbol (i.e. addr == sym->start == sym->end). Actually these are
    valid samples. Don't exit TUI and show report with such symbols.

    Reported-and-Tested-by: Anton Blanchard
    Link: https://lkml.org/lkml/2016/10/8/189
    Signed-off-by: Ravi Bangoria
    Cc: Alexander Shishkin
    Cc: Benjamin Herrenschmidt
    Cc: Chris Riyder
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: Masami Hiramatsu
    Cc: Michael Ellerman
    Cc: Nicholas Piggin
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: stable@kernel.org # v4.9+
    Link: http://lkml.kernel.org/r/1479804050-5028-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Ravi Bangoria
     
  • Obvious copy/paste typo from the requeue program.

    Signed-off-by: Davidlohr Bueso
    Cc: Davidlohr Bueso
    Link: http://lkml.kernel.org/r/1481830584-30909-1-git-send-email-dave@stgolabs.net
    Signed-off-by: Arnaldo Carvalho de Melo

    Davidlohr Bueso
     
  • There might be systems where MAP_32BIT is not defined, like some some
    RHEL7 powerpc versions.

    Signed-off-by: Jiri Olsa
    Cc: David Ahern
    Cc: Kyle McMartin
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Fixes: 256763b01741 ("perf trace beauty mmap: Add more conditional defines")
    Link: http://lkml.kernel.org/r/1481831814-23683-1-git-send-email-jolsa@kernel.org
    [ Changed the Fixme cset to the one removing the conditional switch case for MAP_32BIT ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     

19 Dec, 2016

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "The libnvdimm pull request is relatively small this time around due to
    some development topics being deferred to 4.11.

    As for this pull request the bulk of it has been in -next for several
    releases leading to one late fix being added (commit 868f036fee4b
    ("libnvdimm: fix mishandled nvdimm_clear_poison() return value")). It
    has received a build success notification from the 0day-kbuild robot
    and passes the latest libnvdimm unit tests.

    Summary:

    - Dynamic label support: To date namespace label support has been
    limited to disambiguating cases where PMEM (direct load/store) and
    BLK (mmio aperture) accessed-capacity alias on the same DIMM. Since
    4.9 added support for multiple namespaces per PMEM-region there is
    value to support namespace labels even in the non-aliasing case.
    The presence of a valid namespace index block force-enables label
    support when the kernel would otherwise rely on region boundaries,
    and permits the region to be sub-divided.

    - Handle media errors in namespace metadata: Complement the error
    handling for media errors in namespace data areas with support for
    clearing errors on writes, and downgrading potential machine-check
    exceptions to simple i/o errors on read.

    - Device-DAX region attributes: Add 'align', 'id', and 'size' as
    attributes for device-dax regions. In particular this enables
    userspace tooling to generically size memory mapping and i/o
    operations. Prevent userspace from growing assumptions /
    dependencies about the parent device topology for a dax region. A
    libnvdimm namespace may not always be the parent device of a dax
    region.

    - Various cleanups and small fixes"

    * tag 'libnvdimm-for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    dax: add region 'id', 'size', and 'align' attributes
    libnvdimm: fix mishandled nvdimm_clear_poison() return value
    libnvdimm: replace mutex_is_locked() warnings with lockdep_assert_held
    libnvdimm, pfn: fix align attribute
    libnvdimm, e820: use module_platform_driver
    libnvdimm, namespace: use octal for permissions
    libnvdimm, namespace: avoid multiple sector calculations
    libnvdimm: remove else after return in nsio_rw_bytes()
    libnvdimm, namespace: fix the type of name variable
    libnvdimm: use consistent naming for request_mem_region()
    nvdimm: use the right length of "pmem"
    libnvdimm: check and clear poison before writing to pmem
    tools/testing/nvdimm: dynamic label support
    libnvdimm: allow a platform to force enable label support
    libnvdimm: use generic iostat interfaces

    Linus Torvalds
     

18 Dec, 2016

3 commits

  • Pull networking fixes and cleanups from David Miller:

    1) Revert bogus nla_ok() change, from Alexey Dobriyan.

    2) Various bpf validator fixes from Daniel Borkmann.

    3) Add some necessary SET_NETDEV_DEV() calls to hsis_femac and hip04
    drivers, from Dongpo Li.

    4) Several ethtool ksettings conversions from Philippe Reynes.

    5) Fix bugs in inet port management wrt. soreuseport, from Tom Herbert.

    6) XDP support for virtio_net, from John Fastabend.

    7) Fix NAT handling within a vrf, from David Ahern.

    8) Endianness fixes in dpaa_eth driver, from Claudiu Manoil

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (63 commits)
    net: mv643xx_eth: fix build failure
    isdn: Constify some function parameters
    mlxsw: spectrum: Mark split ports as such
    cgroup: Fix CGROUP_BPF config
    qed: fix old-style function definition
    net: ipv6: check route protocol when deleting routes
    r6040: move spinlock in r6040_close as SOFTIRQ-unsafe lock order detected
    irda: w83977af_ir: cleanup an indent issue
    net: sfc: use new api ethtool_{get|set}_link_ksettings
    net: davicom: dm9000: use new api ethtool_{get|set}_link_ksettings
    net: cirrus: ep93xx: use new api ethtool_{get|set}_link_ksettings
    net: chelsio: cxgb3: use new api ethtool_{get|set}_link_ksettings
    net: chelsio: cxgb2: use new api ethtool_{get|set}_link_ksettings
    bpf: fix mark_reg_unknown_value for spilled regs on map value marking
    bpf: fix overflow in prog accounting
    bpf: dynamically allocate digest scratch buffer
    gtp: Fix initialization of Flags octet in GTPv1 header
    gtp: gtp_check_src_ms_ipv4() always return success
    net/x25: use designated initializers
    isdn: use designated initializers
    ...

    Linus Torvalds
     
  • Pull kbuild updates from Michal Marek:

    - prototypes for x86 asm-exported symbols (Adam Borowski) and a warning
    about missing CRCs (Nick Piggin)

    - asm-exports fix for LTO (Nicolas Pitre)

    - thin archives improvements (Nick Piggin)

    - linker script fix for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION (Nick
    Piggin)

    - genksyms support for __builtin_va_list keyword

    - misc minor fixes

    * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
    x86/kbuild: enable modversions for symbols exported from asm
    kbuild: fix scripts/adjust_autoksyms.sh* for the no modules case
    scripts/kallsyms: remove last remnants of --page-offset option
    make use of make variable CURDIR instead of calling pwd
    kbuild: cmd_export_list: tighten the sed script
    kbuild: minor improvement for thin archives build
    kbuild: modpost warn if export version crc is missing
    kbuild: keep data tables through dead code elimination
    kbuild: improve linker compatibility with lib-ksyms.o build
    genksyms: Regenerate parser
    kbuild/genksyms: handle va_list type
    kbuild: thin archives for multi-y targets
    kbuild: kallsyms allow 3-pass generation if symbols size has changed

    Linus Torvalds
     
  • Dan Williams
     

17 Dec, 2016

3 commits

  • Running ./test_verifier as unprivileged lets 1 out of 98 tests fail:

    [...]
    #71 unpriv: check that printk is disallowed FAIL
    Unexpected error message!
    0: (7a) *(u64 *)(r10 -8) = 0
    1: (bf) r1 = r10
    2: (07) r1 += -8
    3: (b7) r2 = 8
    4: (bf) r3 = r1
    5: (85) call bpf_trace_printk#6
    unknown func bpf_trace_printk#6
    [...]

    The test case is correct, just that the error outcome changed with
    ebb676daa1a3 ("bpf: Print function name in addition to function id").
    Same as with e00c7b216f34 ("bpf: fix multiple issues in selftest suite
    and samples") issue 2), so just fix up the function name.

    Fixes: ebb676daa1a3 ("bpf: Print function name in addition to function id")
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Commit 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL
    registers") introduced a regression where existing programs stopped
    loading due to reaching the verifier's maximum complexity limit,
    whereas prior to this commit they were loading just fine; the affected
    program has roughly 2k instructions.

    What was found is that state pruning couldn't be performed effectively
    anymore due to mismatches of the verifier's register state, in particular
    in the id tracking. It doesn't mean that 57a09bf0a416 is incorrect per
    se, but rather that verifier needs to perform a lot more work for the
    same program with regards to involved map lookups.

    Since commit 57a09bf0a416 is only about tracking registers with type
    PTR_TO_MAP_VALUE_OR_NULL, the id is only needed to follow registers
    until they are promoted through pattern matching with a NULL check to
    either PTR_TO_MAP_VALUE or UNKNOWN_VALUE type. After that point, the
    id becomes irrelevant for the transitioned types.

    For UNKNOWN_VALUE, id is already reset to 0 via mark_reg_unknown_value(),
    but not so for PTR_TO_MAP_VALUE where id is becoming stale. It's even
    transferred further into other types that don't make use of it. Among
    others, one example is where UNKNOWN_VALUE is set on function call
    return with RET_INTEGER return type.

    states_equal() will then fall through the memcmp() on register state;
    note that the second memcmp() uses offsetofend(), so the id is part of
    that since d2a4dd37f6b4 ("bpf: fix state equivalence"). But the bisect
    pointed already to 57a09bf0a416, where we really reach beyond complexity
    limit. What I found was that states_equal() often failed in this
    case due to id mismatches in spilled regs with registers in type
    PTR_TO_MAP_VALUE. Unlike non-spilled regs, spilled regs just perform
    a memcmp() on their reg state and don't have any other optimizations
    in place, therefore also id was relevant in this case for making a
    pruning decision.

    We can safely reset id to 0 as well when converting to PTR_TO_MAP_VALUE.
    For the affected program, it resulted in a ~17 fold reduction of
    complexity and let the program load fine again. Selftest suite also
    runs fine. The only other place where env->id_gen is used currently is
    through direct packet access, but for these cases id is long living, thus
    a different scenario.

    Also, the current logic in mark_map_regs() is not fully correct when
    marking NULL branch with UNKNOWN_VALUE. We need to cache the destination
    reg's id in any case. Otherwise, once we marked that reg as UNKNOWN_VALUE,
    it's id is reset and any subsequent registers that hold the original id
    and are of type PTR_TO_MAP_VALUE_OR_NULL won't be marked UNKNOWN_VALUE
    anymore, since mark_map_reg() reuses the uncached regs[regno].id that
    was just overridden. Note, we don't need to cache it outside of
    mark_map_regs(), since it's called once on this_branch and the other
    time on other_branch, which are both two independent verifier states.
    A test case for this is added here, too.

    Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
    Signed-off-by: Daniel Borkmann
    Acked-by: Thomas Graf
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Pull powerpc updates from Michael Ellerman:
    "Highlights include:

    - Support for the kexec_file_load() syscall, which is a prereq for
    secure and trusted boot.

    - Prevent kernel execution of userspace on P9 Radix (similar to
    SMEP/PXN).

    - Sort the exception tables at build time, to save time at boot, and
    store them as relative offsets to save space in the kernel image &
    memory.

    - Allow building the kernel with thin archives, which should allow us
    to build an allyesconfig once some other fixes land.

    - Build fixes to allow us to correctly rebuild when changing the
    kernel endian from big to little or vice versa.

    - Plumbing so that we can avoid doing a full mm TLB flush on P9
    Radix.

    - Initial stack protector support (-fstack-protector).

    - Support for dumping the radix (aka. Linux) and hash page tables via
    debugfs.

    - Fix an oops in cxl coredump generation when cxl_get_fd() is used.

    - Freescale updates from Scott: "Highlights include 8xx hugepage
    support, qbman fixes/cleanup, device tree updates, and some misc
    cleanup."

    - Many and varied fixes and minor enhancements as always.

    Thanks to:
    Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Anshuman
    Khandual, Anton Blanchard, Balbir Singh, Bartlomiej Zolnierkiewicz,
    Christophe Jaillet, Christophe Leroy, Denis Kirjanov, Elimar
    Riesebieter, Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff
    Levand, Jack Miller, Johan Hovold, Lars-Peter Clausen, Libin,
    Madhavan Srinivasan, Michael Neuling, Nathan Fontenot, Naveen N.
    Rao, Nicholas Piggin, Pan Xinhui, Peter Senna Tschudin, Rashmica
    Gupta, Rui Teng, Russell Currey, Scott Wood, Simon Guo, Suraj
    Jitindar Singh, Thiago Jung Bauermann, Tobias Klauser, Vaibhav Jain"

    [ And thanks to Michael, who took time off from a new baby to get this
    pull request done. - Linus ]

    * tag 'powerpc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (174 commits)
    powerpc/fsl/dts: add FMan node for t1042d4rdb
    powerpc/fsl/dts: add sg_2500_aqr105_phy4 alias on t1024rdb
    powerpc/fsl/dts: add QMan and BMan nodes on t1024
    powerpc/fsl/dts: add QMan and BMan nodes on t1023
    soc/fsl/qman: test: use DEFINE_SPINLOCK()
    powerpc/fsl-lbc: use DEFINE_SPINLOCK()
    powerpc/8xx: Implement support of hugepages
    powerpc: get hugetlbpage handling more generic
    powerpc: port 64 bits pgtable_cache to 32 bits
    powerpc/boot: Request no dynamic linker for boot wrapper
    soc/fsl/bman: Use resource_size instead of computation
    soc/fsl/qe: use builtin_platform_driver
    powerpc/fsl_pmc: use builtin_platform_driver
    powerpc/83xx/suspend: use builtin_platform_driver
    powerpc/ftrace: Fix the comments for ftrace_modify_code
    powerpc/perf: macros for power9 format encoding
    powerpc/perf: power9 raw event format encoding
    powerpc/perf: update attribute_group data structure
    powerpc/perf: factor out the event format field
    powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown
    ...

    Linus Torvalds
     

16 Dec, 2016

20 commits

  • Pull virtio updates from Michael Tsirkin:
    "virtio, vhost: new device, fixes, speedups

    This includes the new virtio crypto device, and fixes all over the
    place. In particular enabling endian-ness checks for sparse builds
    found some bugs which this fixes. And it appears that everyone is in
    agreement that disabling endian-ness sparse checks shouldn't be
    necessary any longer.

    So this enables them for everyone, and drops the __CHECK_ENDIAN__ and
    __bitwise__ APIs.

    IRQ handling in virtio has been refactored somewhat, the larger switch
    to IRQ_SHARED will have to wait as it proved too aggressive"

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits)
    Makefile: drop -D__CHECK_ENDIAN__ from cflags
    fs/logfs: drop __CHECK_ENDIAN__
    Documentation/sparse: drop __CHECK_ENDIAN__
    linux: drop __bitwise__ everywhere
    checkpatch: replace __bitwise__ with __bitwise
    Documentation/sparse: drop __bitwise__
    tools: enable endian checks for all sparse builds
    linux/types.h: enable endian checks for all sparse builds
    virtio_mmio: Set dev.release() to avoid warning
    vhost: remove unused feature bit
    virtio_ring: fix description of virtqueue_get_buf
    vhost/scsi: Remove unused but set variable
    tools/virtio: use {READ,WRITE}_ONCE() in uaccess.h
    vringh: kill off ACCESS_ONCE()
    tools/virtio: fix READ_ONCE()
    crypto: add virtio-crypto driver
    vhost: cache used event for better performance
    vsock: lookup and setup guest_cid inside vhost_vsock_lock
    virtio_pci: split vp_try_to_find_vqs into INTx and MSI-X variants
    virtio_pci: merge vp_free_vectors into vp_del_vqs
    ...

    Linus Torvalds
     
  • …x/kernel/git/shuah/linux-kselftest

    Pull kselftest updates from Shuah Khan:
    "This update consists of:

    - new tests to exercise the Sync Kernel Infrastructure. These tests
    are part of a battery of Android libsync tests and are re-written
    to test the new sync user-space interfaces from Emilio López, and
    Gustavo Padovan.

    - test to run hw-independent mock tests for i915.ko from Chris Wilson

    - a new gpio test case from Bamvor Jian Zhang

    - missing gitignore additions"

    * tag 'linux-kselftest-4.10-rc1-update' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
    selftest/gpio: add gpio test case
    selftest: sync: improve assert() failure message
    kselftests: Exercise hw-independent mock tests for i915.ko
    selftests: add missing gitignore files/dirs
    selftests: add missing set-tz to timers .gitignore
    selftest: sync: stress test for merges
    selftest: sync: stress consumer/producer test
    selftest: sync: stress test for parallelism
    selftest: sync: wait tests for sw_sync framework
    selftest: sync: merge tests for sw_sync framework
    selftest: sync: fence tests for sw_sync framework
    selftest: sync: basic tests for sw_sync framework

    Linus Torvalds
     
  • We dropped need for __CHECK_ENDIAN__ for linux,
    this mirrors this for tools.

    Signed-off-by: Michael S. Tsirkin

    Michael S. Tsirkin
     
  • As a step towards killing off ACCESS_ONCE, use {READ,WRITE}_ONCE() for the
    virtio tools uaccess primitives, pulling these in from .

    With this done, we can kill off the now-unused ACCESS_ONCE() definition.

    Signed-off-by: Mark Rutland
    Cc: Jason Wang
    Cc: Michael S. Tsirkin
    Cc: linux-kernel@vger.kernel.org
    Cc: virtualization@lists.linux-foundation.org
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Cornelia Huck
    Reviewed-by: Jason Wang

    Mark Rutland
     
  • The virtio tools implementation of READ_ONCE() has a single parameter called
    'var', but erroneously refers to 'val' for its cast, and thus won't work unless
    there's a variable of the correct type that happens to be called 'var'.

    Fix this with s/var/val/, making READ_ONCE() work as expected regardless.

    Fixes: a7c490333df3cff5 ("tools/virtio: use virt_xxx barriers")
    Signed-off-by: Mark Rutland
    Cc: Jason Wang
    Cc: Michael S. Tsirkin
    Cc: linux-kernel@vger.kernel.org
    Cc: virtualization@lists.linux-foundation.org
    Signed-off-by: Michael S. Tsirkin
    Reviewed-by: Cornelia Huck
    Reviewed-by: Jason Wang

    Mark Rutland
     
  • Pull tracing updates from Steven Rostedt:
    "This release has a few updates:

    - STM can hook into the function tracer
    - Function filtering now supports more advance glob matching
    - Ftrace selftests updates and added tests
    - Softirq tag in traces now show only softirqs
    - ARM nop added to non traced locations at compile time
    - New trace_marker_raw file that allows for binary input
    - Optimizations to the ring buffer
    - Removal of kmap in trace_marker
    - Wakeup and irqsoff tracers now adhere to the set_graph_notrace file
    - Other various fixes and clean ups"

    * tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (42 commits)
    selftests: ftrace: Shift down default message verbosity
    kprobes/trace: Fix kprobe selftest for newer gcc
    tracing/kprobes: Add a helper method to return number of probe hits
    tracing/rb: Init the CPU mask on allocation
    tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
    tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions too
    fgraph: Handle a case where a tracer ignores set_graph_notrace
    tracing: Replace kmap with copy_from_user() in trace_marker writing
    ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it
    tracing: Allow benchmark to be enabled at early_initcall()
    tracing: Have system enable return error if one of the events fail
    tracing: Do not start benchmark on boot up
    tracing: Have the reg function allow to fail
    ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inline
    ring-buffer: Froce rb_update_write_stamp() to be inlined
    ring-buffer: Force inline of hotpath helper functions
    tracing: Make __buffer_unlock_commit() always_inline
    tracing: Make tracepoint_printk a static_key
    ring-buffer: Always inline rb_event_data()
    ring-buffer: Make rb_reserve_next_event() always inlined
    ...

    Linus Torvalds
     
  • Commit 6c905981743 ("bpf: pre-allocate hash map elements") introduces
    map_flags to bpf_attr for BPF_MAP_CREATE command. Expose this new
    parameter in libbpf.

    By exposing it, users can access flags such as whether or not to
    preallocate the map.

    Signed-off-by: Joe Stringer
    Acked-by: Wang Nan
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Link: http://lkml.kernel.org/r/20161209024620.31660-4-joe@ovn.org
    [ Added clarifying comment made by Wang Nan ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Joe Stringer
     
  • Fixes the following issue when building without access to 'u32' type:

    ./tools/lib/bpf/bpf.h:27:23: error: unknown type name ‘u32’

    Signed-off-by: Joe Stringer
    Acked-by: Wang Nan
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Link: http://lkml.kernel.org/r/20161209024620.31660-3-joe@ovn.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Joe Stringer
     
  • The tools version of this header is out of date; update it to the latest
    version from the kernel headers.

    Signed-off-by: Joe Stringer
    Acked-by: Wang Nan
    Cc: Alexei Starovoitov
    Cc: Daniel Borkmann
    Link: http://lkml.kernel.org/r/20161209024620.31660-2-joe@ovn.org
    [ Sync it harder, after merging with what was in net-next via perf/urgent via torvalds/master to get BPG_PROG_(AT|DE)TACH, etc ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Joe Stringer
     
  • If jump target is outside of function range, perf is not handling it
    correctly. Especially when target address is lesser than function start
    address, target offset will be negative. But, target address declared to
    be unsigned, converts negative number into 2's complement. See below
    example. Here target of 'jumpq' instruction at 34cf8 is 34ac0 which is
    lesser than function start address(34cf0).

    34ac0 - 34cf0 = -0x230 = 0xfffffffffffffdd0

    Objdump output:

    0000000000034cf0 :
    __GI___sigaction():
    34cf0: lea -0x20(%rdi),%eax
    34cf3: cmp -bashx1,%eax
    34cf6: jbe 34d00
    34cf8: jmpq 34ac0
    34cfd: nopl (%rax)
    34d00: mov 0x386161(%rip),%rax # 3bae68
    34d07: movl -bashx16,%fs:(%rax)
    34d0e: mov -bashxffffffff,%eax
    34d13: retq

    perf annotate before applying patch:

    __GI___sigaction /usr/lib64/libc-2.22.so
    lea -0x20(%rdi),%eax
    cmp -bashx1,%eax
    v jbe 10
    v jmpq fffffffffffffdd0
    nop
    10: mov _DYNAMIC+0x2e8,%rax
    movl -bashx16,%fs:(%rax)
    mov -bashxffffffff,%eax
    retq

    perf annotate after applying patch:

    __GI___sigaction /usr/lib64/libc-2.22.so
    lea -0x20(%rdi),%eax
    cmp -bashx1,%eax
    v jbe 10
    ^ jmpq 34ac0
    nop
    10: mov _DYNAMIC+0x2e8,%rax
    movl -bashx16,%fs:(%rax)
    mov -bashxffffffff,%eax
    retq

    Signed-off-by: Ravi Bangoria
    Cc: Alexander Shishkin
    Cc: Chris Riyder
    Cc: Kim Phillips
    Cc: Markus Trippelsdorf
    Cc: Masami Hiramatsu
    Cc: Naveen N. Rao
    Cc: Peter Zijlstra
    Cc: Taeung Song
    Cc: linuxppc-dev@lists.ozlabs.org
    Link: http://lkml.kernel.org/r/1480953407-7605-3-git-send-email-ravi.bangoria@linux.vnet.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Ravi Bangoria
     
  • Architectures like PowerPC have jump instructions that includes a target
    address as a second operand. For example, 'bne cr7,0xc0000000000f6154'.
    Add support for such instruction in perf annotate.

    objdump o/p:
    c0000000000f6140: ld r9,1032(r31)
    c0000000000f6144: cmpdi cr7,r9,0
    c0000000000f6148: bne cr7,0xc0000000000f6154
    c0000000000f614c: ld r9,2312(r30)
    c0000000000f6150: std r9,1032(r31)
    c0000000000f6154: ld r9,88(r31)

    Corresponding perf annotate o/p:

    Before patch:
    ld r9,1032(r31)
    cmpdi cr7,r9,0
    v bne 3ffffffffff09f2c
    ld r9,2312(r30)
    std r9,1032(r31)
    74: ld r9,88(r31)

    After patch:
    ld r9,1032(r31)
    cmpdi cr7,r9,0
    v bne 74
    ld r9,2312(r30)
    std r9,1032(r31)
    74: ld r9,88(r31)

    Signed-off-by: Ravi Bangoria
    Cc: Alexander Shishkin
    Cc: Chris Riyder
    Cc: Kim Phillips
    Cc: Markus Trippelsdorf
    Cc: Masami Hiramatsu
    Cc: Naveen N. Rao
    Cc: Peter Zijlstra
    Cc: Taeung Song
    Cc: linuxppc-dev@lists.ozlabs.org
    Link: http://lkml.kernel.org/r/1480953407-7605-2-git-send-email-ravi.bangoria@linux.vnet.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Ravi Bangoria
     
  • Enable perf_evsel::ignore_missing_thread for -u option to ignore
    complete failure if any of the user's processes die between its
    enumeration and time we open the event.

    Committer notes:

    While doing a 'make -j4 allmodconfig' we sometimes get into the race:

    Before:

    # perf record -u acme
    Error:
    The sys_perf_event_open() syscall returned with 3 (No such process) for event (cycles:ppp).
    /bin/dmesg may provide additional information.
    No CONFIG_PERF_EVENTS=y kernel support configured?
    #

    After:

    [root@jouet ~]# perf record -u acme
    WARNING: Ignored open failure for pid 9888
    WARNING: Ignored open failure for pid 18059
    [root@jouet ~]#

    Which is an improvement, with the races not preventing the remaining threads
    for the specified user from being monitored, but the message probably needs
    further clarification.

    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1481538943-21874-6-git-send-email-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Adding perf_evsel::ignore_missing_cpu_thread bool.

    When set true, it allows perf to ignore error of missing pid of perf
    event syscall.

    We remove missing thread id from the thread_map, so the rest of the
    processing like ioctl and mmap won't get disturbed with -1 fd.

    The reason for supporting this is to ease up monitoring group of pids,
    that 'disappear' before perf opens their event. This currently leads
    perf to report error and exit and makes perf record's -u option unusable
    under certain setup.

    With this change we will allow this race and ignore such failure with
    following warning:

    WARNING: Ignored open failure for pid 8605

    Signed-off-by: Jiri Olsa
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20161213074622.GA3084@krava
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Add thread_map__remove function to remove thread from thread map.

    Add automated test also.

    Committer notes:

    Testing it:

    # perf test "Remove thread map"
    39: Remove thread map : Ok
    # perf test -v "Remove thread map"
    39: Remove thread map :
    --- start ---
    test child forked, pid 4483
    2 threads: 4482, 4483
    1 thread: 4483
    0 thread:
    test child finished with 0
    ---- end ----
    Remove thread map: Ok
    #

    Signed-off-by: Jiri Olsa
    Tested-by: Arnaldo Carvalho de Melo
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1481538943-21874-4-git-send-email-jolsa@kernel.org
    [ Added stdlib.h, to get the free() declaration ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • It's more readable and will ease up following patches.

    Signed-off-by: Jiri Olsa
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1481538943-21874-3-git-send-email-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • Removing extra '--' prefix.

    Signed-off-by: Jiri Olsa
    Cc: David Ahern
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Fixes: ad16511b0e40 ("perf mem: Add -U/-K (--all-user/--all-kernel) options")
    Link: http://lkml.kernel.org/r/1481538943-21874-2-git-send-email-jolsa@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Jiri Olsa
     
  • I.e. those parameters/functions _are_ used, so ditch that misleading attribute.

    Cc: Adrian Hunter
    Cc: David Ahern
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Cc: Wang Nan
    Link: http://lkml.kernel.org/n/tip-13cqtjh0yojg5gzvpq1zzpl0@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Arnaldo Carvalho de Melo
     
  • When --idle-hist option is used with --summary, it now shows idle stats
    with callchains like below:

    Idle stats by callchain:
    CPU 0: 902.195 msec
    Idle time (msec) Count Callchains
    ---------------- ------- --------------------------------------------------
    370.589 69 futex_wait_queue_me
    Idle stats by callchain:
    CPU 0: 13456.840 msec
    Idle time (msec) Count Callchains
    ---------------- ----- --------------------------------------------------
    5386.637 3283 schedule_hrtimeout_range_clock

    Signed-off-by: Namhyung Kim
    Tested-by: Arnaldo Carvalho de Melo
    Acked-by: David Ahern
    Cc: Andi Kleen
    Cc: Jiri Olsa
    Cc: Minchan Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20161208144755.16673-7-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • The --idle-hist option is to analyze system idle state so which process
    makes cpu to go idle. If this option is specified, non-idle events will
    be skipped and processes switching to/from idle will be shown.

    This option is mostly useful when used with --summary(-only) option. In
    the idle-time summary view, idle time is accounted to previous thread
    which is run before idle task.

    The example output looks like following:

    Idle-time summary
    comm parent sched-out idle-time min-idle avg-idle max-idle stddev migrations
    (count) (msec) (msec) (msec) (msec) %
    --------------------------------------------------------------------------------------------
    rcu_preempt[7] 2 95 550.872 0.011 5.798 23.146 7.63 0
    migration/1[16] 2 1 15.558 15.558 15.558 15.558 0.00 0
    khugepaged[39] 2 1 3.062 3.062 3.062 3.062 0.00 0
    kworker/0:1H[124] 2 2 4.728 0.611 2.364 4.116 74.12 0
    systemd-journal[167] 1 1 4.510 4.510 4.510 4.510 0.00 0
    kworker/u16:3[558] 2 13 74.737 0.080 5.749 12.960 21.96 0
    irq/34-iwlwifi[628] 2 21 118.403 0.032 5.638 23.990 24.00 0
    kworker/u17:0[673] 2 1 3.523 3.523 3.523 3.523 0.00 0
    dbus-daemon[722] 1 1 6.743 6.743 6.743 6.743 0.00 0
    ifplugd[741] 1 1 58.826 58.826 58.826 58.826 0.00 0
    wpa_supplicant[1490] 1 1 13.302 13.302 13.302 13.302 0.00 0
    wpa_actiond[1492] 1 2 4.064 0.168 2.032 3.896 91.72 0
    dockerd[1500] 1 1 0.055 0.055 0.055 0.055 0.00 0
    ...

    Signed-off-by: Namhyung Kim
    Tested-by: Arnaldo Carvalho de Melo
    Acked-by: David Ahern
    Cc: Andi Kleen
    Cc: Jiri Olsa
    Cc: Minchan Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20161208144755.16673-6-namhyung@kernel.org
    Link: http://lkml.kernel.org/r/20161213080632.19099-2-namhyung@kernel.org
    [ Merged fix sent by Namhyumg, as posted in the second Link: tag ]
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim
     
  • Sometimes it only focuses on idle-related events like upcoming idle-hist
    feature. In this case we don't want to see other event to reduce noise.

    Signed-off-by: Namhyung Kim
    Acked-by: David Ahern
    Cc: Andi Kleen
    Cc: Jiri Olsa
    Cc: Minchan Kim
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20161208144755.16673-5-namhyung@kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo

    Namhyung Kim