06 Jul, 2017

1 commit

  • Pull networking updates from David Miller:
    "Reasonably busy this cycle, but perhaps not as busy as in the 4.12
    merge window:

    1) Several optimizations for UDP processing under high load from
    Paolo Abeni.

    2) Support pacing internally in TCP when using the sch_fq packet
    scheduler for this is not practical. From Eric Dumazet.

    3) Support mutliple filter chains per qdisc, from Jiri Pirko.

    4) Move to 1ms TCP timestamp clock, from Eric Dumazet.

    5) Add batch dequeueing to vhost_net, from Jason Wang.

    6) Flesh out more completely SCTP checksum offload support, from
    Davide Caratti.

    7) More plumbing of extended netlink ACKs, from David Ahern, Pablo
    Neira Ayuso, and Matthias Schiffer.

    8) Add devlink support to nfp driver, from Simon Horman.

    9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa
    Prabhu.

    10) Add stack depth tracking to BPF verifier and use this information
    in the various eBPF JITs. From Alexei Starovoitov.

    11) Support XDP on qed device VFs, from Yuval Mintz.

    12) Introduce BPF PROG ID for better introspection of installed BPF
    programs. From Martin KaFai Lau.

    13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann.

    14) For loads, allow narrower accesses in bpf verifier checking, from
    Yonghong Song.

    15) Support MIPS in the BPF selftests and samples infrastructure, the
    MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David
    Daney.

    16) Support kernel based TLS, from Dave Watson and others.

    17) Remove completely DST garbage collection, from Wei Wang.

    18) Allow installing TCP MD5 rules using prefixes, from Ivan
    Delalande.

    19) Add XDP support to Intel i40e driver, from Björn Töpel

    20) Add support for TC flower offload in nfp driver, from Simon
    Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub
    Kicinski, and Bert van Leeuwen.

    21) IPSEC offloading support in mlx5, from Ilan Tayari.

    22) Add HW PTP support to macb driver, from Rafal Ozieblo.

    23) Networking refcount_t conversions, From Elena Reshetova.

    24) Add sock_ops support to BPF, from Lawrence Brako. This is useful
    for tuning the TCP sockopt settings of a group of applications,
    currently via CGROUPs"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits)
    net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap
    dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap
    cxgb4: Support for get_ts_info ethtool method
    cxgb4: Add PTP Hardware Clock (PHC) support
    cxgb4: time stamping interface for PTP
    nfp: default to chained metadata prepend format
    nfp: remove legacy MAC address lookup
    nfp: improve order of interfaces in breakout mode
    net: macb: remove extraneous return when MACB_EXT_DESC is defined
    bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case
    bpf: fix return in load_bpf_file
    mpls: fix rtm policy in mpls_getroute
    net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t
    net, ax25: convert ax25_route.refcount from atomic_t to refcount_t
    net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t
    net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t
    net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t
    net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t
    net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t
    net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t
    ...

    Linus Torvalds
     

04 Jul, 2017

2 commits

  • Pull SMP hotplug updates from Thomas Gleixner:
    "This update is primarily a cleanup of the CPU hotplug locking code.

    The hotplug locking mechanism is an open coded RWSEM, which allows
    recursive locking. The main problem with that is the recursive nature
    as it evades the full lockdep coverage and hides potential deadlocks.

    The rework replaces the open coded RWSEM with a percpu RWSEM and
    establishes full lockdep coverage that way.

    The bulk of the changes fix up recursive locking issues and address
    the now fully reported potential deadlocks all over the place. Some of
    these deadlocks have been observed in the RT tree, but on mainline the
    probability was low enough to hide them away."

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
    cpu/hotplug: Constify attribute_group structures
    powerpc: Only obtain cpu_hotplug_lock if called by rtasd
    ARM/hw_breakpoint: Fix possible recursive locking for arch_hw_breakpoint_init
    cpu/hotplug: Remove unused check_for_tasks() function
    perf/core: Don't release cred_guard_mutex if not taken
    cpuhotplug: Link lock stacks for hotplug callbacks
    acpi/processor: Prevent cpu hotplug deadlock
    sched: Provide is_percpu_thread() helper
    cpu/hotplug: Convert hotplug locking to percpu rwsem
    s390: Prevent hotplug rwsem recursion
    arm: Prevent hotplug rwsem recursion
    arm64: Prevent cpu hotplug rwsem recursion
    kprobes: Cure hotplug lock ordering issues
    jump_label: Reorder hotplug lock and jump_label_lock
    perf/tracing/cpuhotplug: Fix locking order
    ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus()
    PCI: Replace the racy recursion prevention
    PCI: Use cpu_hotplug_disable() instead of get_online_cpus()
    perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode()
    x86/perf: Drop EXPORT of perf_check_microcode
    ...

    Linus Torvalds
     
  • Pull perf updates from Ingo Molnar:
    "Most of the changes are for tooling, the main changes in this cycle were:

    - Improve Intel-PT hardware tracing support, both on the kernel and
    on the tooling side: PTWRITE instruction support, power events for
    C-state tracing, etc. (Adrian Hunter)

    - Add support to measure SMI cost to the x86 architecture, with
    tooling support in 'perf stat' (Kan Liang)

    - Support function filtering in 'perf ftrace', plus related
    improvements (Namhyung Kim)

    - Allow adding and removing fields to the default 'perf script'
    columns, using + or - as field prefixes to do so (Andi Kleen)

    - Allow resolving the DSO name with 'perf script -F brstack{sym,off},dso'
    (Mark Santaniello)

    - Add perf tooling unwind support for PowerPC (Paolo Bonzini)

    - ... and various other improvements as well"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (84 commits)
    perf auxtrace: Add CPU filter support
    perf intel-pt: Do not use TSC packets for calculating CPU cycles to TSC
    perf intel-pt: Update documentation to include new ptwrite and power events
    perf intel-pt: Add example script for power events and PTWRITE
    perf intel-pt: Synthesize new power and "ptwrite" events
    perf intel-pt: Move code in intel_pt_synth_events() to simplify attr setting
    perf intel-pt: Factor out intel_pt_set_event_name()
    perf intel-pt: Tidy messages into called function intel_pt_synth_event()
    perf intel-pt: Tidy Intel PT evsel lookup into separate function
    perf intel-pt: Join needlessly wrapped lines
    perf intel-pt: Remove unused instructions_sample_period
    perf intel-pt: Factor out common code synthesizing event samples
    perf script: Add synthesized Intel PT power and ptwrite events
    perf/x86/intel: Constify the 'lbr_desc[]' array and make a function static
    perf script: Add 'synth' field for synthesized event payloads
    perf auxtrace: Add itrace option to output power events
    perf auxtrace: Add itrace option to output ptwrite events
    tools include: Add byte-swapping macros to kernel.h
    perf script: Add 'synth' event type for synthesized events
    x86/insn: perf tools: Add new ptwrite instruction
    ...

    Linus Torvalds
     

01 Jul, 2017

1 commit


21 Jun, 2017

1 commit

  • If the event for which an AUX area is about to be allocated, does
    not support setting up an AUX area, rb_alloc_aux() return -ENOTSUPP.

    This error condition is being returned unfiltered to the user space,
    and, for example, the perf tools fails with:

    failed to mmap with 524 (INTERNAL ERROR: strerror_r(524, 0x3fff497a1c8, 512)=22)

    This error can be easily seen with "perf record -m 128,256 -e cpu-clock".

    The 524 error code maps to -ENOTSUPP (in rb_alloc_aux()). The -ENOTSUPP
    error code shall be only used within the kernel. So the correct error
    code would then be -EOPNOTSUPP.

    With this commit, the perf tool then reports:

    failed to mmap with 95 (Operation not supported)

    which is more clear.

    Signed-off-by: Hendrik Brueckner
    Acked-by: Alexander Shishkin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Pu Hou
    Cc: Thomas Gleixner
    Cc: Thomas-Mich Richter
    Cc: acme@kernel.org
    Cc: linux-s390@vger.kernel.org
    Link: http://lkml.kernel.org/r/1497954399-6355-1-git-send-email-brueckner@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar

    Hendrik Brueckner
     

15 Jun, 2017

1 commit


08 Jun, 2017

4 commits

  • The function was added by commit e5d1367f17ba ("perf: Add cgroup
    support") in 2011 and hasn't been used since then. Removing it fixes the
    following warning when building with Clang:

    kernel/events/core.c:696:19: error: unused function 'perf_cgroup_event_cgrp_time' [-Werror,-Wunused-function]

    Signed-off-by: Matthias Kaehlcke
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Douglas Anderson
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20170523215132.189049-1-mka@chromium.org
    Signed-off-by: Ingo Molnar

    Matthias Kaehlcke
     
  • Andi was asking about PERF_FORMAT_GROUP vs inherited events, which led
    to the discovery of a bug from commit:

    3dab77fb1bf8 ("perf: Rework/fix the whole read vs group stuff")

    - PERF_SAMPLE_GROUP = 1U << 4,
    + PERF_SAMPLE_READ = 1U << 4,

    - if (attr->inherit && (attr->sample_type & PERF_SAMPLE_GROUP))
    + if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))

    is a clear fail :/

    While this changes user visible behaviour; it was previously possible
    to create an inherited event with PERF_SAMPLE_READ; this is deemed
    acceptible because its results were always incorrect.

    Reported-by: Andi Kleen
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Fixes: 3dab77fb1bf8 ("perf: Rework/fix the whole read vs group stuff")
    Link: http://lkml.kernel.org/r/20170530094512.dy2nljns2uq7qa3j@hirez.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • When doing sampling, for example:

    perf record -e cycles:u ...

    On workloads that do a lot of kernel entry/exits we see kernel
    samples, even though :u is specified. This is due to skid existing.

    This might be a security issue because it can leak kernel addresses even
    though kernel sampling support is disabled.

    The patch drops the kernel samples if exclude_kernel is specified.

    For example, test on Haswell desktop:

    perf record -e cycles:u
    perf report --stdio

    Before patch applied:

    99.77% mgen mgen [.] buf_read
    0.20% mgen mgen [.] rand_buf_init
    0.01% mgen [kernel.vmlinux] [k] apic_timer_interrupt
    0.00% mgen mgen [.] last_free_elem
    0.00% mgen libc-2.23.so [.] __random_r
    0.00% mgen libc-2.23.so [.] _int_malloc
    0.00% mgen mgen [.] rand_array_init
    0.00% mgen [kernel.vmlinux] [k] page_fault
    0.00% mgen libc-2.23.so [.] __random
    0.00% mgen libc-2.23.so [.] __strcasestr
    0.00% mgen ld-2.23.so [.] strcmp
    0.00% mgen ld-2.23.so [.] _dl_start
    0.00% mgen libc-2.23.so [.] sched_setaffinity@@GLIBC_2.3.4
    0.00% mgen ld-2.23.so [.] _start

    We can see kernel symbols apic_timer_interrupt and page_fault.

    After patch applied:

    99.79% mgen mgen [.] buf_read
    0.19% mgen mgen [.] rand_buf_init
    0.00% mgen libc-2.23.so [.] __random_r
    0.00% mgen mgen [.] rand_array_init
    0.00% mgen mgen [.] last_free_elem
    0.00% mgen libc-2.23.so [.] vfprintf
    0.00% mgen libc-2.23.so [.] rand
    0.00% mgen libc-2.23.so [.] __random
    0.00% mgen libc-2.23.so [.] _int_malloc
    0.00% mgen libc-2.23.so [.] _IO_doallocbuf
    0.00% mgen ld-2.23.so [.] do_lookup_x
    0.00% mgen ld-2.23.so [.] open_verify.constprop.7
    0.00% mgen ld-2.23.so [.] _dl_important_hwcaps
    0.00% mgen libc-2.23.so [.] sched_setaffinity@@GLIBC_2.3.4
    0.00% mgen ld-2.23.so [.] _start

    There are only userspace symbols.

    Signed-off-by: Jin Yao
    Signed-off-by: Peter Zijlstra (Intel)
    Cc:
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: acme@kernel.org
    Cc: jolsa@kernel.org
    Cc: kan.liang@intel.com
    Cc: mark.rutland@arm.com
    Cc: will.deacon@arm.com
    Cc: yao.jin@intel.com
    Link: http://lkml.kernel.org/r/1495706947-3744-1-git-send-email-yao.jin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Jin Yao
     

05 Jun, 2017

1 commit

  • Allow BPF_PROG_TYPE_PERF_EVENT program types to attach to all
    perf_event types, including HW_CACHE, RAW, and dynamic pmu events.
    Only tracepoint/kprobe events are treated differently which require
    BPF_PROG_TYPE_TRACEPOINT/BPF_PROG_TYPE_KPROBE program types accordingly.

    Also add support for reading all event counters using
    bpf_perf_event_read() helper.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     

03 Jun, 2017

1 commit

  • If we failed to acquire task's cred_guard_mutex we shouldn't proceed
    to release it in the error path.

    Fixes: a63fbed776c ("perf/tracing/cpuhotplug: Fix locking order")
    Signed-off-by: Alexander Levin
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: mathieu.desnoyers@efficios.com
    Cc: mhiramat@kernel.org
    Cc: paulmck@linux.vnet.ibm.com
    Cc: bigeasy@linutronix.de
    Link: http://lkml.kernel.org/r/20170603033903.12056-1-alexander.levin@verizon.com
    Signed-off-by: Thomas Gleixner

    Alexander Levin
     

26 May, 2017

1 commit

  • perf, tracing, kprobes and jump_labels have a gazillion of ways to create
    dependency lock chains. Some of those involve nested invocations of
    get_online_cpus().

    The conversion of the hotplug locking to a percpu rwsem requires to avoid
    such nested calls. sys_perf_event_open() protects most of the syscall logic
    against cpu hotplug. This causes nested calls and lock inversions versus
    ftrace and kprobes in various interesting ways.

    It's impossible to move the hotplug locking to the outer end of all call
    chains in the involved facilities, so the hotplug protection in
    sys_perf_event_open() needs to be solved differently.

    Introduce 'pmus_mutex' which protects a perf private online cpumask. This
    mutex is taken when the mask is updated in the cpu hotplug callbacks and
    can be taken in sys_perf_event_open() to protect the swhash setup/teardown
    code and when the final judgement about a valid event has to be made.

    [ tglx: Produced changelog and fixed the swhash interaction ]

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Cc: Paul E. McKenney
    Cc: Sebastian Siewior
    Cc: Steven Rostedt
    Cc: Mathieu Desnoyers
    Cc: Masami Hiramatsu
    Link: http://lkml.kernel.org/r/20170524081548.930941109@linutronix.de

    Thomas Gleixner
     

23 May, 2017

2 commits

  • We don't set an error code here which means that perf_event_alloc()
    returns ERR_PTR(0) (in other words NULL). The callers are not expecting
    that and would Oops.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Fixes: 375637bc5249 ("perf/core: Introduce address range filtering")
    Link: http://lkml.kernel.org/r/20170522090418.hvs6icgpdo53wkn5@mwanda
    Signed-off-by: Ingo Molnar

    Dan Carpenter
     
  • perf_init_event() can't return NULL. If it did, the error handling is
    incomplete and we would crash. I have removed this confusing dead code.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Link: http://lkml.kernel.org/r/20170522090348.5g7yyld5en3yeky4@mwanda
    Signed-off-by: Ingo Molnar

    Dan Carpenter
     

10 May, 2017

1 commit

  • Perf can generate and record a user callchain in response to a synchronous
    request, such as a tracepoint firing. If this happens under set_fs(KERNEL_DS),
    then we can end up walking the user stack (and dereferencing/saving whatever we
    find there) without the protections usually afforded by checks such as
    access_ok.

    Rather than play whack-a-mole with each architecture's stack unwinding
    implementation, fix the root of the problem by ensuring that we force USER_DS
    when invoking perf_callchain_user from the perf core.

    Reported-by: Al Viro
    Signed-off-by: Will Deacon
    Acked-by: Peter Zijlstra
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Will Deacon
     

28 Mar, 2017

1 commit


16 Mar, 2017

7 commits

  • While going through the event inheritance code Oleg got confused.

    Add some comments to better explain the silent dissapearance of
    orphaned events.

    So what happens is that at perf_event_release_kernel() time; when an
    event looses its connection to userspace (and ceases to exist from the
    user's perspective) we can still have an arbitrary amount of inherited
    copies of the event. We want to synchronously find and remove all
    these child events.

    Since that requires a bit of lock juggling, there is the possibility
    that concurrent clone()s will create new child events. Therefore we
    first mark the parent event as DEAD, which marks all the extant child
    events as orphaned.

    We then avoid copying orphaned events; in order to avoid getting more
    of them.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Dmitry Vyukov
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mathieu Desnoyers
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: fweisbec@gmail.com
    Link: http://lkml.kernel.org/r/20170316125823.289567442@infradead.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • We have ctx->event_list that contains all events; no need to
    repeatedly iterate the group lists to find them all.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Dmitry Vyukov
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mathieu Desnoyers
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: fweisbec@gmail.com
    Link: http://lkml.kernel.org/r/20170316125823.239678244@infradead.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • While hunting for clues to a use-after-free, Oleg spotted that
    perf_event_init_context() can loose an error value with the result
    that fork() can succeed even though we did not fully inherit the perf
    event context.

    Spotted-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Dmitry Vyukov
    Cc: Frederic Weisbecker
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mathieu Desnoyers
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: oleg@redhat.com
    Cc: stable@vger.kernel.org
    Fixes: 889ff0150661 ("perf/core: Split context's event group list into pinned and non-pinned lists")
    Link: http://lkml.kernel.org/r/20170316125823.190342547@infradead.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Dmitry reported syzcaller tripped a use-after-free in perf_release().

    After much puzzlement Oleg spotted the below scenario:

    Task1 Task2

    fork()
    perf_event_init_task()
    /* ... */
    goto bad_fork_$foo;
    /* ... */
    perf_event_free_task()
    mutex_lock(ctx->lock)
    perf_free_event(B)

    perf_event_release_kernel(A)
    mutex_lock(A->child_mutex)
    list_for_each_entry(child, ...) {
    /* child == B */
    ctx = B->ctx;
    get_ctx(ctx);
    mutex_unlock(A->child_mutex);

    mutex_lock(A->child_mutex)
    list_del_init(B->child_list)
    mutex_unlock(A->child_mutex)

    /* ... */

    mutex_unlock(ctx->lock);
    put_ctx() /* >0 */
    free_task();
    mutex_lock(ctx->lock);
    mutex_lock(A->child_mutex);
    /* ... */
    mutex_unlock(A->child_mutex);
    mutex_unlock(ctx->lock)
    put_ctx() /* 0 */
    ctx->task && !TOMBSTONE
    put_task_struct() /* UAF */

    This patch closes the hole by making perf_event_free_task() destroy the
    task ctx relation such that perf_event_release_kernel() will no longer
    observe the now dead task.

    Spotted-by: Oleg Nesterov
    Reported-by: Dmitry Vyukov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mathieu Desnoyers
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: fweisbec@gmail.com
    Cc: oleg@redhat.com
    Cc: stable@vger.kernel.org
    Fixes: c6e5b73242d2 ("perf: Synchronously clean up child events")
    Link: http://lkml.kernel.org/r/20170314155949.GE32474@worktop
    Link: http://lkml.kernel.org/r/20170316125823.140295131@infradead.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The Intel PT driver needs to be able to communicate partial AUX transactions,
    that is, transactions with gaps in data for reasons other than no room
    left in the buffer (i.e. truncated transactions). Therefore, this condition
    does not imply a wakeup for the consumer.

    To this end, add a new "partial" AUX flag.

    Signed-off-by: Alexander Shishkin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mathieu Poirier
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: vince@deater.net
    Link: http://lkml.kernel.org/r/20170220133352.17995-4-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Alexander Shishkin
     
  • In preparation for adding more flags to perf AUX records, introduce a
    separate API for setting the flags for a session, rather than appending
    more bool arguments to perf_aux_output_end. This allows to set each
    flag at the time a corresponding condition is detected, instead of
    tracking it in each driver's private state.

    Signed-off-by: Will Deacon
    Signed-off-by: Alexander Shishkin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mathieu Poirier
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Cc: vince@deater.net
    Link: http://lkml.kernel.org/r/20170220133352.17995-3-alexander.shishkin@linux.intel.com
    Signed-off-by: Ingo Molnar

    Will Deacon
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     

14 Mar, 2017

1 commit

  • With the advert of container technologies like docker, that depend on
    namespaces for isolation, there is a need for tracing support for
    namespaces. This patch introduces new PERF_RECORD_NAMESPACES event for
    recording namespaces related info. By recording info for every
    namespace, it is left to userspace to take a call on the definition of a
    container and trace containers by updating perf tool accordingly.

    Each namespace has a combination of device and inode numbers. Though
    every namespace has the same device number currently, that may change in
    future to avoid the need for a namespace of namespaces. Considering such
    possibility, record both device and inode numbers separately for each
    namespace.

    Signed-off-by: Hari Bathini
    Acked-by: Jiri Olsa
    Acked-by: Peter Zijlstra
    Cc: Alexander Shishkin
    Cc: Alexei Starovoitov
    Cc: Ananth N Mavinakayanahalli
    Cc: Aravinda Prasad
    Cc: Brendan Gregg
    Cc: Daniel Borkmann
    Cc: Eric Biederman
    Cc: Sargun Dhillon
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/148891929686.25309.2827618988917007768.stgit@hbathini.in.ibm.com
    Signed-off-by: Arnaldo Carvalho de Melo

    Hari Bathini
     

10 Mar, 2017

1 commit

  • Fix typos and add the following to the scripts/spelling.txt:

    disble||disable
    disbled||disabled

    I kept the TSL2563_INT_DISBLED in /drivers/iio/light/tsl2563.c
    untouched. The macro is not referenced at all, but this commit is
    touching only comment blocks just in case.

    Link: http://lkml.kernel.org/r/1481573103-11329-20-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     

02 Mar, 2017

4 commits

  • We are going to split out of , which
    will have to be picked up from other headers and a couple of .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • We are going to split out of , which
    will have to be picked up from other headers and a couple of .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • We are going to split out of , which
    will have to be picked up from other headers and a couple of .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    The APIs that are going to be moved first are:

    mm_alloc()
    __mmdrop()
    mmdrop()
    mmdrop_async_fn()
    mmdrop_async()
    mmget_not_zero()
    mmput()
    mmput_async()
    get_task_mm()
    mm_access()
    mm_release()

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • We are going to split out of , which
    will have to be picked up from other headers and .c files.

    Create a trivial placeholder file that just
    maps to to make this patch obviously correct and
    bisectable.

    Include the new header in the files that are going to need it.

    Acked-by: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

01 Mar, 2017

1 commit

  • Pull perf fixes from Ingo Molnar:
    "Misc fixes on the kernel and tooling side - nothing in particular
    stands out"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
    perf/core: Fix the perf_cpu_time_max_percent check
    perf/core: Fix perf_event_enable_on_exec() timekeeping (again)
    perf/core: Remove confusing comment and move put_ctx()
    perf record: Honor --quiet option properly
    perf annotate: Add -q/--quiet option
    perf diff: Add -q/--quiet option
    perf report: Add -q/--quiet option
    perf utils: Check verbose flag properly
    perf utils: Add perf_quiet_option()
    perf record: Add -a as default target
    perf stat: Add -a as default target
    perf tools: Fail on using multiple bits long terms without value
    perf tools: Move new_term arguments into struct parse_events_term template
    perf build: Add special fixdep cleaning rule
    perf tools: Replace _SC_NPROCESSORS_CONF with max_present_cpu in cpu_topology_map
    perf header: Make build_cpu_topology skip offline/absent CPUs
    perf cpumap: Add cpu__max_present_cpu()
    perf session: Fix DEBUG=1 build with clang
    tools lib traceevent: It's preempt not prempt
    perf python: Filter out -specs=/a/b/c from the python binding cc options
    ...

    Linus Torvalds
     

28 Feb, 2017

3 commits

  • Merge yet more updates from Andrew Morton:

    - a few MM remainders

    - misc things

    - autofs updates

    - signals

    - affs updates

    - ipc

    - nilfs2

    - spelling.txt updates

    * emailed patches from Andrew Morton : (78 commits)
    mm, x86: fix HIGHMEM64 && PARAVIRT build config for native_pud_clear()
    mm: add arch-independent testcases for RODATA
    hfs: atomically read inode size
    mm: clarify mm_struct.mm_{users,count} documentation
    mm: use mmget_not_zero() helper
    mm: add new mmget() helper
    mm: add new mmgrab() helper
    checkpatch: warn when formats use %Z and suggest %z
    lib/vsprintf.c: remove %Z support
    scripts/spelling.txt: add some typo-words
    scripts/spelling.txt: add "followings" pattern and fix typo instances
    scripts/spelling.txt: add "therfore" pattern and fix typo instances
    scripts/spelling.txt: add "overwriten" pattern and fix typo instances
    scripts/spelling.txt: add "overwritting" pattern and fix typo instances
    scripts/spelling.txt: add "deintialize(d)" pattern and fix typo instances
    scripts/spelling.txt: add "disassocation" pattern and fix typo instances
    scripts/spelling.txt: add "omited" pattern and fix typo instances
    scripts/spelling.txt: add "explictely" pattern and fix typo instances
    scripts/spelling.txt: add "applys" pattern and fix typo instances
    scripts/spelling.txt: add "configuartion" pattern and fix typo instances
    ...

    Linus Torvalds
     
  • Pull cgroup updates from Tejun Heo:
    "Several noteworthy changes.

    - Parav's rdma controller is finally merged. It is very straight
    forward and can limit the abosolute numbers of common rdma
    constructs used by different cgroups.

    - kernel/cgroup.c got too chubby and disorganized. Created
    kernel/cgroup/ subdirectory and moved all cgroup related files
    under kernel/ there and reorganized the core code. This hurts for
    backporting patches but was long overdue.

    - cgroup v2 process listing reimplemented so that it no longer
    depends on allocating a buffer large enough to cache the entire
    result to sort and uniq the output. v2 has always mangled the sort
    order to ensure that users don't depend on the sorted output, so
    this shouldn't surprise anybody. This makes the pid listing
    functions use the same iterators that are used internally, which
    have to have the same iterating capabilities anyway.

    - perf cgroup filtering now works automatically on cgroup v2. This
    patch was posted a long time ago but somehow fell through the
    cracks.

    - misc fixes asnd documentation updates"

    * 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (27 commits)
    kernfs: fix locking around kernfs_ops->release() callback
    cgroup: drop the matching uid requirement on migration for cgroup v2
    cgroup, perf_event: make perf_event controller work on cgroup2 hierarchy
    cgroup: misc cleanups
    cgroup: call subsys->*attach() only for subsystems which are actually affected by migration
    cgroup: track migration context in cgroup_mgctx
    cgroup: cosmetic update to cgroup_taskset_add()
    rdmacg: Fixed uninitialized current resource usage
    cgroup: Add missing cgroup-v2 PID controller documentation.
    rdmacg: Added documentation for rdmacg
    IB/core: added support to use rdma cgroup controller
    rdmacg: Added rdma cgroup controller
    cgroup: fix a comment typo
    cgroup: fix RCU related sparse warnings
    cgroup: move namespace code to kernel/cgroup/namespace.c
    cgroup: rename functions for consistency
    cgroup: move v1 mount functions to kernel/cgroup/cgroup-v1.c
    cgroup: separate out cgroup1_kf_syscall_ops
    cgroup: refactor mount path and clearly distinguish v1 and v2 paths
    cgroup: move cgroup v1 specific code to kernel/cgroup/cgroup-v1.c
    ...

    Linus Torvalds
     
  • We already have the helper, we can convert the rest of the kernel
    mechanically using:

    git grep -l 'atomic_inc_not_zero.*mm_users' | xargs sed -i 's/atomic_inc_not_zero(&\(.*\)->mm_users)/mmget_not_zero\(\1\)/'

    This is needed for a later patch that hooks into the helper, but might
    be a worthwhile cleanup on its own.

    Link: http://lkml.kernel.org/r/20161218123229.22952-3-vegard.nossum@oracle.com
    Signed-off-by: Vegard Nossum
    Acked-by: Michal Hocko
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vegard Nossum
     

25 Feb, 2017

3 commits

  • For consistency, it worth converting all page_check_address() to
    page_vma_mapped_walk(), so we could drop the former.

    Link: http://lkml.kernel.org/r/20170129173858.45174-10-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Reviewed-by: Srikar Dronamraju
    Cc: Andrea Arcangeli
    Cc: Hillf Danton
    Cc: Hugh Dickins
    Cc: Johannes Weiner
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Patch series "Fix few rmap-related THP bugs", v3.

    The patchset fixes handing PTE-mapped THPs in page_referenced() and
    page_idle_clear_pte_refs().

    To achieve that I've intrdocued new helper -- page_vma_mapped_walk() --
    which replaces all page_check_address{,_transhuge}() and covers all THP
    cases.

    Patchset overview:
    - First patch fixes one uprobe bug (unrelated to the rest of the
    patchset, just spotted it at the same time);

    - Patches 2-5 fix handling PTE-mapped THPs in page_referenced(),
    page_idle_clear_pte_refs() and rmap core;

    - Patches 6-12 convert all page_check_address{,_transhuge}() users
    (plus remove_migration_pte()) to page_vma_mapped_walk() and drop
    unused helpers.

    I think the fixes are not critical enough for stable@ as they don't lead
    to crashes or hangs, only suboptimal behaviour.

    This patch (of 12):

    For THPs page_check_address() always fails. It leads to endless loop in
    uprobe_write_opcode().

    Testcase with huge-tmpfs (uprobes cannot probe anonymous memory).

    mount -t debugfs none /sys/kernel/debug
    mount -t tmpfs -o huge=always none /mnt
    gcc -Wall -O2 -o /mnt/test -x c - < /sys/kernel/debug/tracing/uprobe_events
    echo 1 > /sys/kernel/debug/tracing/events/uprobes/enable
    /mnt/test

    Let's split THPs before trying to replace.

    Link: http://lkml.kernel.org/r/20170129173858.45174-2-kirill.shutemov@linux.intel.com
    Signed-off-by: Kirill A. Shutemov
    Acked-by: Rik van Riel
    Acked-by: Johannes Weiner
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: Hillf Danton
    Cc: Srikar Dronamraju
    Cc: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • ->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
    take a vma and vmf parameter when the vma already resides in vmf.

    Remove the vma parameter to simplify things.

    [arnd@arndb.de: fix ARM build]
    Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
    Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Dave Jiang
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Ross Zwisler
    Cc: Theodore Ts'o
    Cc: Darrick J. Wong
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     

24 Feb, 2017

3 commits

  • Use "proc_dointvec_minmax" instead of "proc_dointvec" to check the input
    value from user-space.

    If not, we can set a big value and some vars will overflow like
    "sysctl_perf_event_sample_rate" which will cause a lot of unexpected
    problems.

    Signed-off-by: Tan Xiaojun
    Signed-off-by: Peter Zijlstra (Intel)
    Cc:
    Cc:
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Link: http://lkml.kernel.org/r/1487829879-56237-1-git-send-email-tanxiaojun@huawei.com
    Signed-off-by: Ingo Molnar

    Tan Xiaojun
     
  • Where commit:

    7fce250915ef ("perf: Fix scaling vs. perf_event_enable_on_exec()")

    disabled the ctx-time a-priory, such that all events get enabled and
    scheduled at the time point in time, there is one hole in that patch,
    when no events do get enabled nothing re-enables the ctx-time.

    Reported-by: Ravi Bangoria
    Reported-by: Anton Blanchard
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Fixes: 7fce250915ef ("perf: Fix scaling vs. perf_event_enable_on_exec()")
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Since commit:

    321027c1fe77 ("perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race")

    ... the code looks like (assuming move_group==1):

    gctx = __perf_event_ctx_lock_double(group_leader, ctx);

    perf_remove_from_context(group_leader, 0);
    list_for_each_entry(sibling, &group_leader->sibling_list, group_entry) {
    perf_remove_from_context(sibling, 0);
    put_ctx(gctx);
    }

    /* ... */

    /* misleading comment about how this is the last reference */
    put_ctx(gctx);

    perf_event_ctx_unlock(group_leader, gctx);

    What that 'last' put_ctx() does is drop @group_leader's reference on
    gctx after having dropped all its potential sibling references.

    But the thing is that __perf_event_ctx_lock_double() returns with a
    reference _and_ a held lock, and perf_event_ctx_unlock() unlocks that
    lock and drops that reference. Therefore that put_ctx() cannot be the
    'last' of anything, nor is there an unbalance in puts.

    To reduce confusion, remove the comment and place the put_ctx() next
    to the remove_from_context() call.

    Reported-by: Ben Hutchings
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alexander Shishkin
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Signed-off-by: Ingo Molnar

    Peter Zijlstra