06 Jul, 2017
1 commit
-
Pull networking updates from David Miller:
"Reasonably busy this cycle, but perhaps not as busy as in the 4.12
merge window:1) Several optimizations for UDP processing under high load from
Paolo Abeni.2) Support pacing internally in TCP when using the sch_fq packet
scheduler for this is not practical. From Eric Dumazet.3) Support mutliple filter chains per qdisc, from Jiri Pirko.
4) Move to 1ms TCP timestamp clock, from Eric Dumazet.
5) Add batch dequeueing to vhost_net, from Jason Wang.
6) Flesh out more completely SCTP checksum offload support, from
Davide Caratti.7) More plumbing of extended netlink ACKs, from David Ahern, Pablo
Neira Ayuso, and Matthias Schiffer.8) Add devlink support to nfp driver, from Simon Horman.
9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa
Prabhu.10) Add stack depth tracking to BPF verifier and use this information
in the various eBPF JITs. From Alexei Starovoitov.11) Support XDP on qed device VFs, from Yuval Mintz.
12) Introduce BPF PROG ID for better introspection of installed BPF
programs. From Martin KaFai Lau.13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann.
14) For loads, allow narrower accesses in bpf verifier checking, from
Yonghong Song.15) Support MIPS in the BPF selftests and samples infrastructure, the
MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David
Daney.16) Support kernel based TLS, from Dave Watson and others.
17) Remove completely DST garbage collection, from Wei Wang.
18) Allow installing TCP MD5 rules using prefixes, from Ivan
Delalande.19) Add XDP support to Intel i40e driver, from Björn Töpel
20) Add support for TC flower offload in nfp driver, from Simon
Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub
Kicinski, and Bert van Leeuwen.21) IPSEC offloading support in mlx5, from Ilan Tayari.
22) Add HW PTP support to macb driver, from Rafal Ozieblo.
23) Networking refcount_t conversions, From Elena Reshetova.
24) Add sock_ops support to BPF, from Lawrence Brako. This is useful
for tuning the TCP sockopt settings of a group of applications,
currently via CGROUPs"* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits)
net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap
dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap
cxgb4: Support for get_ts_info ethtool method
cxgb4: Add PTP Hardware Clock (PHC) support
cxgb4: time stamping interface for PTP
nfp: default to chained metadata prepend format
nfp: remove legacy MAC address lookup
nfp: improve order of interfaces in breakout mode
net: macb: remove extraneous return when MACB_EXT_DESC is defined
bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case
bpf: fix return in load_bpf_file
mpls: fix rtm policy in mpls_getroute
net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t
net, ax25: convert ax25_route.refcount from atomic_t to refcount_t
net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t
net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t
...
04 Jul, 2017
2 commits
-
Pull SMP hotplug updates from Thomas Gleixner:
"This update is primarily a cleanup of the CPU hotplug locking code.The hotplug locking mechanism is an open coded RWSEM, which allows
recursive locking. The main problem with that is the recursive nature
as it evades the full lockdep coverage and hides potential deadlocks.The rework replaces the open coded RWSEM with a percpu RWSEM and
establishes full lockdep coverage that way.The bulk of the changes fix up recursive locking issues and address
the now fully reported potential deadlocks all over the place. Some of
these deadlocks have been observed in the RT tree, but on mainline the
probability was low enough to hide them away."* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
cpu/hotplug: Constify attribute_group structures
powerpc: Only obtain cpu_hotplug_lock if called by rtasd
ARM/hw_breakpoint: Fix possible recursive locking for arch_hw_breakpoint_init
cpu/hotplug: Remove unused check_for_tasks() function
perf/core: Don't release cred_guard_mutex if not taken
cpuhotplug: Link lock stacks for hotplug callbacks
acpi/processor: Prevent cpu hotplug deadlock
sched: Provide is_percpu_thread() helper
cpu/hotplug: Convert hotplug locking to percpu rwsem
s390: Prevent hotplug rwsem recursion
arm: Prevent hotplug rwsem recursion
arm64: Prevent cpu hotplug rwsem recursion
kprobes: Cure hotplug lock ordering issues
jump_label: Reorder hotplug lock and jump_label_lock
perf/tracing/cpuhotplug: Fix locking order
ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus()
PCI: Replace the racy recursion prevention
PCI: Use cpu_hotplug_disable() instead of get_online_cpus()
perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode()
x86/perf: Drop EXPORT of perf_check_microcode
... -
Pull perf updates from Ingo Molnar:
"Most of the changes are for tooling, the main changes in this cycle were:- Improve Intel-PT hardware tracing support, both on the kernel and
on the tooling side: PTWRITE instruction support, power events for
C-state tracing, etc. (Adrian Hunter)- Add support to measure SMI cost to the x86 architecture, with
tooling support in 'perf stat' (Kan Liang)- Support function filtering in 'perf ftrace', plus related
improvements (Namhyung Kim)- Allow adding and removing fields to the default 'perf script'
columns, using + or - as field prefixes to do so (Andi Kleen)- Allow resolving the DSO name with 'perf script -F brstack{sym,off},dso'
(Mark Santaniello)- Add perf tooling unwind support for PowerPC (Paolo Bonzini)
- ... and various other improvements as well"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (84 commits)
perf auxtrace: Add CPU filter support
perf intel-pt: Do not use TSC packets for calculating CPU cycles to TSC
perf intel-pt: Update documentation to include new ptwrite and power events
perf intel-pt: Add example script for power events and PTWRITE
perf intel-pt: Synthesize new power and "ptwrite" events
perf intel-pt: Move code in intel_pt_synth_events() to simplify attr setting
perf intel-pt: Factor out intel_pt_set_event_name()
perf intel-pt: Tidy messages into called function intel_pt_synth_event()
perf intel-pt: Tidy Intel PT evsel lookup into separate function
perf intel-pt: Join needlessly wrapped lines
perf intel-pt: Remove unused instructions_sample_period
perf intel-pt: Factor out common code synthesizing event samples
perf script: Add synthesized Intel PT power and ptwrite events
perf/x86/intel: Constify the 'lbr_desc[]' array and make a function static
perf script: Add 'synth' field for synthesized event payloads
perf auxtrace: Add itrace option to output power events
perf auxtrace: Add itrace option to output ptwrite events
tools include: Add byte-swapping macros to kernel.h
perf script: Add 'synth' event type for synthesized events
x86/insn: perf tools: Add new ptwrite instruction
...
01 Jul, 2017
1 commit
-
A set of overlapping changes in macvlan and the rocker
driver, nothing serious.Signed-off-by: David S. Miller
21 Jun, 2017
1 commit
-
If the event for which an AUX area is about to be allocated, does
not support setting up an AUX area, rb_alloc_aux() return -ENOTSUPP.This error condition is being returned unfiltered to the user space,
and, for example, the perf tools fails with:failed to mmap with 524 (INTERNAL ERROR: strerror_r(524, 0x3fff497a1c8, 512)=22)
This error can be easily seen with "perf record -m 128,256 -e cpu-clock".
The 524 error code maps to -ENOTSUPP (in rb_alloc_aux()). The -ENOTSUPP
error code shall be only used within the kernel. So the correct error
code would then be -EOPNOTSUPP.With this commit, the perf tool then reports:
failed to mmap with 95 (Operation not supported)
which is more clear.
Signed-off-by: Hendrik Brueckner
Acked-by: Alexander Shishkin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Pu Hou
Cc: Thomas Gleixner
Cc: Thomas-Mich Richter
Cc: acme@kernel.org
Cc: linux-s390@vger.kernel.org
Link: http://lkml.kernel.org/r/1497954399-6355-1-git-send-email-brueckner@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar
15 Jun, 2017
1 commit
-
The conflicts were two cases of overlapping changes in
batman-adv and the qed driver.Signed-off-by: David S. Miller
08 Jun, 2017
4 commits
-
The function was added by commit e5d1367f17ba ("perf: Add cgroup
support") in 2011 and hasn't been used since then. Removing it fixes the
following warning when building with Clang:kernel/events/core.c:696:19: error: unused function 'perf_cgroup_event_cgrp_time' [-Werror,-Wunused-function]
Signed-off-by: Matthias Kaehlcke
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Douglas Anderson
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20170523215132.189049-1-mka@chromium.org
Signed-off-by: Ingo Molnar -
Andi was asking about PERF_FORMAT_GROUP vs inherited events, which led
to the discovery of a bug from commit:3dab77fb1bf8 ("perf: Rework/fix the whole read vs group stuff")
- PERF_SAMPLE_GROUP = 1U << 4,
+ PERF_SAMPLE_READ = 1U << 4,- if (attr->inherit && (attr->sample_type & PERF_SAMPLE_GROUP))
+ if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))is a clear fail :/
While this changes user visible behaviour; it was previously possible
to create an inherited event with PERF_SAMPLE_READ; this is deemed
acceptible because its results were always incorrect.Reported-by: Andi Kleen
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Fixes: 3dab77fb1bf8 ("perf: Rework/fix the whole read vs group stuff")
Link: http://lkml.kernel.org/r/20170530094512.dy2nljns2uq7qa3j@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar -
Signed-off-by: Ingo Molnar
-
When doing sampling, for example:
perf record -e cycles:u ...
On workloads that do a lot of kernel entry/exits we see kernel
samples, even though :u is specified. This is due to skid existing.This might be a security issue because it can leak kernel addresses even
though kernel sampling support is disabled.The patch drops the kernel samples if exclude_kernel is specified.
For example, test on Haswell desktop:
perf record -e cycles:u
perf report --stdioBefore patch applied:
99.77% mgen mgen [.] buf_read
0.20% mgen mgen [.] rand_buf_init
0.01% mgen [kernel.vmlinux] [k] apic_timer_interrupt
0.00% mgen mgen [.] last_free_elem
0.00% mgen libc-2.23.so [.] __random_r
0.00% mgen libc-2.23.so [.] _int_malloc
0.00% mgen mgen [.] rand_array_init
0.00% mgen [kernel.vmlinux] [k] page_fault
0.00% mgen libc-2.23.so [.] __random
0.00% mgen libc-2.23.so [.] __strcasestr
0.00% mgen ld-2.23.so [.] strcmp
0.00% mgen ld-2.23.so [.] _dl_start
0.00% mgen libc-2.23.so [.] sched_setaffinity@@GLIBC_2.3.4
0.00% mgen ld-2.23.so [.] _startWe can see kernel symbols apic_timer_interrupt and page_fault.
After patch applied:
99.79% mgen mgen [.] buf_read
0.19% mgen mgen [.] rand_buf_init
0.00% mgen libc-2.23.so [.] __random_r
0.00% mgen mgen [.] rand_array_init
0.00% mgen mgen [.] last_free_elem
0.00% mgen libc-2.23.so [.] vfprintf
0.00% mgen libc-2.23.so [.] rand
0.00% mgen libc-2.23.so [.] __random
0.00% mgen libc-2.23.so [.] _int_malloc
0.00% mgen libc-2.23.so [.] _IO_doallocbuf
0.00% mgen ld-2.23.so [.] do_lookup_x
0.00% mgen ld-2.23.so [.] open_verify.constprop.7
0.00% mgen ld-2.23.so [.] _dl_important_hwcaps
0.00% mgen libc-2.23.so [.] sched_setaffinity@@GLIBC_2.3.4
0.00% mgen ld-2.23.so [.] _startThere are only userspace symbols.
Signed-off-by: Jin Yao
Signed-off-by: Peter Zijlstra (Intel)
Cc:
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Namhyung Kim
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Cc: kan.liang@intel.com
Cc: mark.rutland@arm.com
Cc: will.deacon@arm.com
Cc: yao.jin@intel.com
Link: http://lkml.kernel.org/r/1495706947-3744-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Ingo Molnar
05 Jun, 2017
1 commit
-
Allow BPF_PROG_TYPE_PERF_EVENT program types to attach to all
perf_event types, including HW_CACHE, RAW, and dynamic pmu events.
Only tracepoint/kprobe events are treated differently which require
BPF_PROG_TYPE_TRACEPOINT/BPF_PROG_TYPE_KPROBE program types accordingly.Also add support for reading all event counters using
bpf_perf_event_read() helper.Signed-off-by: Alexei Starovoitov
Signed-off-by: David S. Miller
03 Jun, 2017
1 commit
-
If we failed to acquire task's cred_guard_mutex we shouldn't proceed
to release it in the error path.Fixes: a63fbed776c ("perf/tracing/cpuhotplug: Fix locking order")
Signed-off-by: Alexander Levin
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: mhiramat@kernel.org
Cc: paulmck@linux.vnet.ibm.com
Cc: bigeasy@linutronix.de
Link: http://lkml.kernel.org/r/20170603033903.12056-1-alexander.levin@verizon.com
Signed-off-by: Thomas Gleixner
26 May, 2017
1 commit
-
perf, tracing, kprobes and jump_labels have a gazillion of ways to create
dependency lock chains. Some of those involve nested invocations of
get_online_cpus().The conversion of the hotplug locking to a percpu rwsem requires to avoid
such nested calls. sys_perf_event_open() protects most of the syscall logic
against cpu hotplug. This causes nested calls and lock inversions versus
ftrace and kprobes in various interesting ways.It's impossible to move the hotplug locking to the outer end of all call
chains in the involved facilities, so the hotplug protection in
sys_perf_event_open() needs to be solved differently.Introduce 'pmus_mutex' which protects a perf private online cpumask. This
mutex is taken when the mask is updated in the cpu hotplug callbacks and
can be taken in sys_perf_event_open() to protect the swhash setup/teardown
code and when the final judgement about a valid event has to be made.[ tglx: Produced changelog and fixed the swhash interaction ]
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Thomas Gleixner
Acked-by: Ingo Molnar
Cc: Paul E. McKenney
Cc: Sebastian Siewior
Cc: Steven Rostedt
Cc: Mathieu Desnoyers
Cc: Masami Hiramatsu
Link: http://lkml.kernel.org/r/20170524081548.930941109@linutronix.de
23 May, 2017
2 commits
-
We don't set an error code here which means that perf_event_alloc()
returns ERR_PTR(0) (in other words NULL). The callers are not expecting
that and would Oops.Signed-off-by: Dan Carpenter
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Fixes: 375637bc5249 ("perf/core: Introduce address range filtering")
Link: http://lkml.kernel.org/r/20170522090418.hvs6icgpdo53wkn5@mwanda
Signed-off-by: Ingo Molnar -
perf_init_event() can't return NULL. If it did, the error handling is
incomplete and we would crash. I have removed this confusing dead code.Signed-off-by: Dan Carpenter
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Link: http://lkml.kernel.org/r/20170522090348.5g7yyld5en3yeky4@mwanda
Signed-off-by: Ingo Molnar
10 May, 2017
1 commit
-
Perf can generate and record a user callchain in response to a synchronous
request, such as a tracepoint firing. If this happens under set_fs(KERNEL_DS),
then we can end up walking the user stack (and dereferencing/saving whatever we
find there) without the protections usually afforded by checks such as
access_ok.Rather than play whack-a-mole with each architecture's stack unwinding
implementation, fix the root of the problem by ensuring that we force USER_DS
when invoking perf_callchain_user from the perf core.Reported-by: Al Viro
Signed-off-by: Will Deacon
Acked-by: Peter Zijlstra
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Thomas Gleixner
Signed-off-by: Ingo Molnar
28 Mar, 2017
1 commit
-
Signed-off-by: Ingo Molnar
16 Mar, 2017
7 commits
-
While going through the event inheritance code Oleg got confused.
Add some comments to better explain the silent dissapearance of
orphaned events.So what happens is that at perf_event_release_kernel() time; when an
event looses its connection to userspace (and ceases to exist from the
user's perspective) we can still have an arbitrary amount of inherited
copies of the event. We want to synchronously find and remove all
these child events.Since that requires a bit of lock juggling, there is the possibility
that concurrent clone()s will create new child events. Therefore we
first mark the parent event as DEAD, which marks all the extant child
events as orphaned.We then avoid copying orphaned events; in order to avoid getting more
of them.Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: Dmitry Vyukov
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Mathieu Desnoyers
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: fweisbec@gmail.com
Link: http://lkml.kernel.org/r/20170316125823.289567442@infradead.org
Signed-off-by: Ingo Molnar -
We have ctx->event_list that contains all events; no need to
repeatedly iterate the group lists to find them all.Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: Dmitry Vyukov
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Mathieu Desnoyers
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: fweisbec@gmail.com
Link: http://lkml.kernel.org/r/20170316125823.239678244@infradead.org
Signed-off-by: Ingo Molnar -
While hunting for clues to a use-after-free, Oleg spotted that
perf_event_init_context() can loose an error value with the result
that fork() can succeed even though we did not fully inherit the perf
event context.Spotted-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: Dmitry Vyukov
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Mathieu Desnoyers
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: oleg@redhat.com
Cc: stable@vger.kernel.org
Fixes: 889ff0150661 ("perf/core: Split context's event group list into pinned and non-pinned lists")
Link: http://lkml.kernel.org/r/20170316125823.190342547@infradead.org
Signed-off-by: Ingo Molnar -
Dmitry reported syzcaller tripped a use-after-free in perf_release().
After much puzzlement Oleg spotted the below scenario:
Task1 Task2
fork()
perf_event_init_task()
/* ... */
goto bad_fork_$foo;
/* ... */
perf_event_free_task()
mutex_lock(ctx->lock)
perf_free_event(B)perf_event_release_kernel(A)
mutex_lock(A->child_mutex)
list_for_each_entry(child, ...) {
/* child == B */
ctx = B->ctx;
get_ctx(ctx);
mutex_unlock(A->child_mutex);mutex_lock(A->child_mutex)
list_del_init(B->child_list)
mutex_unlock(A->child_mutex)/* ... */
mutex_unlock(ctx->lock);
put_ctx() /* >0 */
free_task();
mutex_lock(ctx->lock);
mutex_lock(A->child_mutex);
/* ... */
mutex_unlock(A->child_mutex);
mutex_unlock(ctx->lock)
put_ctx() /* 0 */
ctx->task && !TOMBSTONE
put_task_struct() /* UAF */This patch closes the hole by making perf_event_free_task() destroy the
task ctx relation such that perf_event_release_kernel() will no longer
observe the now dead task.Spotted-by: Oleg Nesterov
Reported-by: Dmitry Vyukov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Mathieu Desnoyers
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: fweisbec@gmail.com
Cc: oleg@redhat.com
Cc: stable@vger.kernel.org
Fixes: c6e5b73242d2 ("perf: Synchronously clean up child events")
Link: http://lkml.kernel.org/r/20170314155949.GE32474@worktop
Link: http://lkml.kernel.org/r/20170316125823.140295131@infradead.org
Signed-off-by: Ingo Molnar -
The Intel PT driver needs to be able to communicate partial AUX transactions,
that is, transactions with gaps in data for reasons other than no room
left in the buffer (i.e. truncated transactions). Therefore, this condition
does not imply a wakeup for the consumer.To this end, add a new "partial" AUX flag.
Signed-off-by: Alexander Shishkin
Signed-off-by: Peter Zijlstra (Intel)
Cc: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Mathieu Poirier
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20170220133352.17995-4-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar -
In preparation for adding more flags to perf AUX records, introduce a
separate API for setting the flags for a session, rather than appending
more bool arguments to perf_aux_output_end. This allows to set each
flag at the time a corresponding condition is detected, instead of
tracking it in each driver's private state.Signed-off-by: Will Deacon
Signed-off-by: Alexander Shishkin
Signed-off-by: Peter Zijlstra (Intel)
Cc: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Mathieu Poirier
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20170220133352.17995-3-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar -
Signed-off-by: Ingo Molnar
14 Mar, 2017
1 commit
-
With the advert of container technologies like docker, that depend on
namespaces for isolation, there is a need for tracing support for
namespaces. This patch introduces new PERF_RECORD_NAMESPACES event for
recording namespaces related info. By recording info for every
namespace, it is left to userspace to take a call on the definition of a
container and trace containers by updating perf tool accordingly.Each namespace has a combination of device and inode numbers. Though
every namespace has the same device number currently, that may change in
future to avoid the need for a namespace of namespaces. Considering such
possibility, record both device and inode numbers separately for each
namespace.Signed-off-by: Hari Bathini
Acked-by: Jiri Olsa
Acked-by: Peter Zijlstra
Cc: Alexander Shishkin
Cc: Alexei Starovoitov
Cc: Ananth N Mavinakayanahalli
Cc: Aravinda Prasad
Cc: Brendan Gregg
Cc: Daniel Borkmann
Cc: Eric Biederman
Cc: Sargun Dhillon
Cc: Steven Rostedt
Link: http://lkml.kernel.org/r/148891929686.25309.2827618988917007768.stgit@hbathini.in.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo
10 Mar, 2017
1 commit
-
Fix typos and add the following to the scripts/spelling.txt:
disble||disable
disbled||disabledI kept the TSL2563_INT_DISBLED in /drivers/iio/light/tsl2563.c
untouched. The macro is not referenced at all, but this commit is
touching only comment blocks just in case.Link: http://lkml.kernel.org/r/1481573103-11329-20-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Mar, 2017
4 commits
-
We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar -
We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar -
We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.The APIs that are going to be moved first are:
mm_alloc()
__mmdrop()
mmdrop()
mmdrop_async_fn()
mmdrop_async()
mmget_not_zero()
mmput()
mmput_async()
get_task_mm()
mm_access()
mm_release()Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar -
We are going to split out of , which
will have to be picked up from other headers and .c files.Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar
01 Mar, 2017
1 commit
-
Pull perf fixes from Ingo Molnar:
"Misc fixes on the kernel and tooling side - nothing in particular
stands out"* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
perf/core: Fix the perf_cpu_time_max_percent check
perf/core: Fix perf_event_enable_on_exec() timekeeping (again)
perf/core: Remove confusing comment and move put_ctx()
perf record: Honor --quiet option properly
perf annotate: Add -q/--quiet option
perf diff: Add -q/--quiet option
perf report: Add -q/--quiet option
perf utils: Check verbose flag properly
perf utils: Add perf_quiet_option()
perf record: Add -a as default target
perf stat: Add -a as default target
perf tools: Fail on using multiple bits long terms without value
perf tools: Move new_term arguments into struct parse_events_term template
perf build: Add special fixdep cleaning rule
perf tools: Replace _SC_NPROCESSORS_CONF with max_present_cpu in cpu_topology_map
perf header: Make build_cpu_topology skip offline/absent CPUs
perf cpumap: Add cpu__max_present_cpu()
perf session: Fix DEBUG=1 build with clang
tools lib traceevent: It's preempt not prempt
perf python: Filter out -specs=/a/b/c from the python binding cc options
...
28 Feb, 2017
3 commits
-
Merge yet more updates from Andrew Morton:
- a few MM remainders
- misc things
- autofs updates
- signals
- affs updates
- ipc
- nilfs2
- spelling.txt updates
* emailed patches from Andrew Morton : (78 commits)
mm, x86: fix HIGHMEM64 && PARAVIRT build config for native_pud_clear()
mm: add arch-independent testcases for RODATA
hfs: atomically read inode size
mm: clarify mm_struct.mm_{users,count} documentation
mm: use mmget_not_zero() helper
mm: add new mmget() helper
mm: add new mmgrab() helper
checkpatch: warn when formats use %Z and suggest %z
lib/vsprintf.c: remove %Z support
scripts/spelling.txt: add some typo-words
scripts/spelling.txt: add "followings" pattern and fix typo instances
scripts/spelling.txt: add "therfore" pattern and fix typo instances
scripts/spelling.txt: add "overwriten" pattern and fix typo instances
scripts/spelling.txt: add "overwritting" pattern and fix typo instances
scripts/spelling.txt: add "deintialize(d)" pattern and fix typo instances
scripts/spelling.txt: add "disassocation" pattern and fix typo instances
scripts/spelling.txt: add "omited" pattern and fix typo instances
scripts/spelling.txt: add "explictely" pattern and fix typo instances
scripts/spelling.txt: add "applys" pattern and fix typo instances
scripts/spelling.txt: add "configuartion" pattern and fix typo instances
... -
Pull cgroup updates from Tejun Heo:
"Several noteworthy changes.- Parav's rdma controller is finally merged. It is very straight
forward and can limit the abosolute numbers of common rdma
constructs used by different cgroups.- kernel/cgroup.c got too chubby and disorganized. Created
kernel/cgroup/ subdirectory and moved all cgroup related files
under kernel/ there and reorganized the core code. This hurts for
backporting patches but was long overdue.- cgroup v2 process listing reimplemented so that it no longer
depends on allocating a buffer large enough to cache the entire
result to sort and uniq the output. v2 has always mangled the sort
order to ensure that users don't depend on the sorted output, so
this shouldn't surprise anybody. This makes the pid listing
functions use the same iterators that are used internally, which
have to have the same iterating capabilities anyway.- perf cgroup filtering now works automatically on cgroup v2. This
patch was posted a long time ago but somehow fell through the
cracks.- misc fixes asnd documentation updates"
* 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (27 commits)
kernfs: fix locking around kernfs_ops->release() callback
cgroup: drop the matching uid requirement on migration for cgroup v2
cgroup, perf_event: make perf_event controller work on cgroup2 hierarchy
cgroup: misc cleanups
cgroup: call subsys->*attach() only for subsystems which are actually affected by migration
cgroup: track migration context in cgroup_mgctx
cgroup: cosmetic update to cgroup_taskset_add()
rdmacg: Fixed uninitialized current resource usage
cgroup: Add missing cgroup-v2 PID controller documentation.
rdmacg: Added documentation for rdmacg
IB/core: added support to use rdma cgroup controller
rdmacg: Added rdma cgroup controller
cgroup: fix a comment typo
cgroup: fix RCU related sparse warnings
cgroup: move namespace code to kernel/cgroup/namespace.c
cgroup: rename functions for consistency
cgroup: move v1 mount functions to kernel/cgroup/cgroup-v1.c
cgroup: separate out cgroup1_kf_syscall_ops
cgroup: refactor mount path and clearly distinguish v1 and v2 paths
cgroup: move cgroup v1 specific code to kernel/cgroup/cgroup-v1.c
... -
We already have the helper, we can convert the rest of the kernel
mechanically using:git grep -l 'atomic_inc_not_zero.*mm_users' | xargs sed -i 's/atomic_inc_not_zero(&\(.*\)->mm_users)/mmget_not_zero\(\1\)/'
This is needed for a later patch that hooks into the helper, but might
be a worthwhile cleanup on its own.Link: http://lkml.kernel.org/r/20161218123229.22952-3-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum
Acked-by: Michal Hocko
Acked-by: Peter Zijlstra (Intel)
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Feb, 2017
3 commits
-
For consistency, it worth converting all page_check_address() to
page_vma_mapped_walk(), so we could drop the former.Link: http://lkml.kernel.org/r/20170129173858.45174-10-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov
Reviewed-by: Srikar Dronamraju
Cc: Andrea Arcangeli
Cc: Hillf Danton
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Vladimir Davydov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Patch series "Fix few rmap-related THP bugs", v3.
The patchset fixes handing PTE-mapped THPs in page_referenced() and
page_idle_clear_pte_refs().To achieve that I've intrdocued new helper -- page_vma_mapped_walk() --
which replaces all page_check_address{,_transhuge}() and covers all THP
cases.Patchset overview:
- First patch fixes one uprobe bug (unrelated to the rest of the
patchset, just spotted it at the same time);- Patches 2-5 fix handling PTE-mapped THPs in page_referenced(),
page_idle_clear_pte_refs() and rmap core;- Patches 6-12 convert all page_check_address{,_transhuge}() users
(plus remove_migration_pte()) to page_vma_mapped_walk() and drop
unused helpers.I think the fixes are not critical enough for stable@ as they don't lead
to crashes or hangs, only suboptimal behaviour.This patch (of 12):
For THPs page_check_address() always fails. It leads to endless loop in
uprobe_write_opcode().Testcase with huge-tmpfs (uprobes cannot probe anonymous memory).
mount -t debugfs none /sys/kernel/debug
mount -t tmpfs -o huge=always none /mnt
gcc -Wall -O2 -o /mnt/test -x c - < /sys/kernel/debug/tracing/uprobe_events
echo 1 > /sys/kernel/debug/tracing/events/uprobes/enable
/mnt/testLet's split THPs before trying to replace.
Link: http://lkml.kernel.org/r/20170129173858.45174-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov
Acked-by: Rik van Riel
Acked-by: Johannes Weiner
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Andrea Arcangeli
Cc: Hugh Dickins
Cc: Hillf Danton
Cc: Srikar Dronamraju
Cc: Vladimir Davydov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.Remove the vma parameter to simplify things.
[arnd@arndb.de: fix ARM build]
Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang
Signed-off-by: Arnd Bergmann
Reviewed-by: Ross Zwisler
Cc: Theodore Ts'o
Cc: Darrick J. Wong
Cc: Matthew Wilcox
Cc: Dave Hansen
Cc: Christoph Hellwig
Cc: Jan Kara
Cc: Dan Williams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Feb, 2017
3 commits
-
Use "proc_dointvec_minmax" instead of "proc_dointvec" to check the input
value from user-space.If not, we can set a big value and some vars will overflow like
"sysctl_perf_event_sample_rate" which will cause a lot of unexpected
problems.Signed-off-by: Tan Xiaojun
Signed-off-by: Peter Zijlstra (Intel)
Cc:
Cc:
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Link: http://lkml.kernel.org/r/1487829879-56237-1-git-send-email-tanxiaojun@huawei.com
Signed-off-by: Ingo Molnar -
Where commit:
7fce250915ef ("perf: Fix scaling vs. perf_event_enable_on_exec()")
disabled the ctx-time a-priory, such that all events get enabled and
scheduled at the time point in time, there is one hole in that patch,
when no events do get enabled nothing re-enables the ctx-time.Reported-by: Ravi Bangoria
Reported-by: Anton Blanchard
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Fixes: 7fce250915ef ("perf: Fix scaling vs. perf_event_enable_on_exec()")
Signed-off-by: Ingo Molnar -
Since commit:
321027c1fe77 ("perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race")
... the code looks like (assuming move_group==1):
gctx = __perf_event_ctx_lock_double(group_leader, ctx);
perf_remove_from_context(group_leader, 0);
list_for_each_entry(sibling, &group_leader->sibling_list, group_entry) {
perf_remove_from_context(sibling, 0);
put_ctx(gctx);
}/* ... */
/* misleading comment about how this is the last reference */
put_ctx(gctx);perf_event_ctx_unlock(group_leader, gctx);
What that 'last' put_ctx() does is drop @group_leader's reference on
gctx after having dropped all its potential sibling references.But the thing is that __perf_event_ctx_lock_double() returns with a
reference _and_ a held lock, and perf_event_ctx_unlock() unlocks that
lock and drops that reference. Therefore that put_ctx() cannot be the
'last' of anything, nor is there an unbalance in puts.To reduce confusion, remove the comment and place the put_ctx() next
to the remove_from_context() call.Reported-by: Ben Hutchings
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Signed-off-by: Ingo Molnar