Eric Lee / smarc-fsl-linux-kernel

19 Jul, 2013

1 commit

7a62711aa Merge tag 'driver-core-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core patches from Greg KH:
"Here are some driver core patches for 3.11-rc2. They aren't really
bugfixes, but a bunch of new helper macros for drivers to properly
create attribute groups, which drivers and subsystems need to fix up a
ton of race issues with incorrectly creating sysfs files (binary and
normal) after userspace has been told that the device is present.

Also here is the ability to create binary files as attribute groups,
to solve that race condition, which was impossible to do before this,
so that's my fault the drivers were broken.

The majority of the .c changes is indenting and moving code around a
bit. It affects no existing code, but allows the large backlog of 70+
patches that I already have created to start flowing into the
different subtrees, instead of having to live in my driver-core tree,
causing merge nightmares in linux-next for the next few months.

These were finalized too late for the -rc1 merge window, which is why
they were didn't make that pull request, testing and review from
others didn't happen until a few weeks ago, and then there's the whole
distraction of the past few days, which prevented these from getting
to you sooner, sorry about that.

Oh, and there's a bugfix for the documentation build warning in here
as well. All of these have been in linux-next this week, with no
reported problems"

* tag 'driver-core-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver-core: fix new kernel-doc warning in base/platform.c
sysfs: use file mode defines from stat.h
sysfs: add more helper macro's for (bin_)attribute(_groups)
driver core: add default groups to struct class
driver core: Introduce device_create_groups
sysfs: prevent warning when only using binary attributes
sysfs: add support for binary attributes in groups
driver core: device.h: add RW and RO attribute macros
sysfs.h: add BIN_ATTR macro
sysfs.h: add ATTRIBUTE_GROUPS() macro
sysfs.h: add __ATTR_RW() macro

Linus Torvalds
2013-07-19 03:48:40 +0800

17 Jul, 2013

1 commit

b9b325974 sysfs.h: add __ATTR_RW() macro ... Browse Code »

A number of parts of the kernel created their own version of this, might
as well have the sysfs core provide it instead.

Reviewed-by: Guenter Roeck
Tested-by: Guenter Roeck
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2013-07-17 01:57:36 +0800

15 Jul, 2013

1 commit

0db0628d9 kernel: delete __cpuinit usage from all core kernel files ... Browse Code »

The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

This removes all the uses of the __cpuinit macros from C files in
the core kernel directories (kernel, init, lib, mm, and include)
that don't really have a specific maintainer.

[1] https://lkml.org/lkml/2013/5/20/589

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2013-07-15 07:36:59 +0800

12 Jul, 2013

3 commits

058ebd0eb perf: Fix perf_lock_task_context() vs RCU ... Browse Code »

Jiri managed to trigger this warning:

[] ======================================================
[] [ INFO: possible circular locking dependency detected ]
[] 3.10.0+ #228 Tainted: G W
[] -------------------------------------------------------
[] p/6613 is trying to acquire lock:
[] (rcu_node_0){..-...}, at: [] rcu_read_unlock_special+0xa7/0x250
[]
[] but task is already holding lock:
[] (&ctx->lock){-.-...}, at: [] perf_lock_task_context+0xd9/0x2c0
[]
[] which lock already depends on the new lock.
[]
[] the existing dependency chain (in reverse order) is:
[]
[] -> #4 (&ctx->lock){-.-...}:
[] -> #3 (&rq->lock){-.-.-.}:
[] -> #2 (&p->pi_lock){-.-.-.}:
[] -> #1 (&rnp->nocb_gp_wq[1]){......}:
[] -> #0 (rcu_node_0){..-...}:

Paul was quick to explain that due to preemptible RCU we cannot call
rcu_read_unlock() while holding scheduler (or nested) locks when part
of the read side critical section was preemptible.

Therefore solve it by making the entire RCU read side non-preemptible.

Also pull out the retry from under the non-preempt to play nice with RT.

Reported-by: Jiri Olsa
Helped-out-by: Paul E. McKenney
Cc:
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2013-07-12 17:11:09 +0800
06f417968 perf: Remove WARN_ON_ONCE() check in __perf_event_enable() for valid scenario ... Browse Code »

The '!ctx->is_active' check has a valid scenario, so
there's no need for the warning.

The reason is that there's a time window between the
'ctx->is_active' check in the perf_event_enable() function
and the __perf_event_enable() function having:

- IRQs on
- ctx->lock unlocked

where the task could be killed and 'ctx' deactivated by
perf_event_exit_task(), ending up with the warning below.

So remove the WARN_ON_ONCE() check and add comments to
explain it all.

This addresses the following warning reported by Vince Weaver:

[ 324.983534] ------------[ cut here ]------------
[ 324.984420] WARNING: at kernel/events/core.c:1953 __perf_event_enable+0x187/0x190()
[ 324.984420] Modules linked in:
[ 324.984420] CPU: 19 PID: 2715 Comm: nmi_bug_snb Not tainted 3.10.0+ #246
[ 324.984420] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
[ 324.984420] 0000000000000009 ffff88043fce3ec8 ffffffff8160ea0b ffff88043fce3f00
[ 324.984420] ffffffff81080ff0 ffff8802314fdc00 ffff880231a8f800 ffff88043fcf7860
[ 324.984420] 0000000000000286 ffff880231a8f800 ffff88043fce3f10 ffffffff8108103a
[ 324.984420] Call Trace:
[ 324.984420] [] dump_stack+0x19/0x1b
[ 324.984420] [] warn_slowpath_common+0x70/0xa0
[ 324.984420] [] warn_slowpath_null+0x1a/0x20
[ 324.984420] [] __perf_event_enable+0x187/0x190
[ 324.984420] [] remote_function+0x40/0x50
[ 324.984420] [] generic_smp_call_function_single_interrupt+0xbe/0x130
[ 324.984420] [] smp_call_function_single_interrupt+0x27/0x40
[ 324.984420] [] call_function_single_interrupt+0x6f/0x80
[ 324.984420] [] ? _raw_spin_unlock_irqrestore+0x41/0x70
[ 324.984420] [] perf_event_exit_task+0x14d/0x210
[ 324.984420] [] ? switch_task_namespaces+0x24/0x60
[ 324.984420] [] do_exit+0x2b6/0xa40
[ 324.984420] [] ? _raw_spin_unlock_irq+0x2c/0x30
[ 324.984420] [] do_group_exit+0x49/0xc0
[ 324.984420] [] get_signal_to_deliver+0x254/0x620
[ 324.984420] [] do_signal+0x57/0x5a0
[ 324.984420] [] ? __do_page_fault+0x2a4/0x4e0
[ 324.984420] [] ? retint_restore_args+0xe/0xe
[ 324.984420] [] ? retint_signal+0x11/0x84
[ 324.984420] [] do_notify_resume+0x65/0x80
[ 324.984420] [] retint_signal+0x46/0x84
[ 324.984420] ---[ end trace 442ec2f04db3771a ]---

Reported-by: Vince Weaver
Signed-off-by: Jiri Olsa
Suggested-by: Peter Zijlstra
Cc: Corey Ashford
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc:
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1373384651-6109-2-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar

Jiri Olsa
2013-07-12 17:11:01 +0800
734df5ab5 perf: Clone child context from parent context pmu ... Browse Code »

Currently when the child context for inherited events is
created, it's based on the pmu object of the first event
of the parent context.

This is wrong for the following scenario:

- HW context having HW and SW event
- HW event got removed (closed)
- SW event stays in HW context as the only event
and its pmu is used to clone the child context

The issue starts when the cpu context object is touched
based on the pmu context object (__get_cpu_context). In
this case the HW context will work with SW cpu context
ending up with following WARN below.

Fixing this by using parent context pmu object to clone
from child context.

Addresses the following warning reported by Vince Weaver:

[ 2716.472065] ------------[ cut here ]------------
[ 2716.476035] WARNING: at kernel/events/core.c:2122 task_ctx_sched_out+0x3c/0x)
[ 2716.476035] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs locn
[ 2716.476035] CPU: 0 PID: 3164 Comm: perf_fuzzer Not tainted 3.10.0-rc4 #2
[ 2716.476035] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2
[ 2716.476035] 0000000000000000 ffffffff8102e215 0000000000000000 ffff88011fc18
[ 2716.476035] ffff8801175557f0 0000000000000000 ffff880119fda88c ffffffff810ad
[ 2716.476035] ffff880119fda880 ffffffff810af02a 0000000000000009 ffff880117550
[ 2716.476035] Call Trace:
[ 2716.476035] [] ? warn_slowpath_common+0x5b/0x70
[ 2716.476035] [] ? task_ctx_sched_out+0x3c/0x5f
[ 2716.476035] [] ? perf_event_exit_task+0xbf/0x194
[ 2716.476035] [] ? do_exit+0x3e7/0x90c
[ 2716.476035] [] ? __do_fault+0x359/0x394
[ 2716.476035] [] ? do_group_exit+0x66/0x98
[ 2716.476035] [] ? get_signal_to_deliver+0x479/0x4ad
[ 2716.476035] [] ? __perf_event_task_sched_out+0x230/0x2d1
[ 2716.476035] [] ? do_signal+0x3c/0x432
[ 2716.476035] [] ? ctx_sched_in+0x43/0x141
[ 2716.476035] [] ? perf_event_context_sched_in+0x7a/0x90
[ 2716.476035] [] ? __perf_event_task_sched_in+0x31/0x118
[ 2716.476035] [] ? mmdrop+0xd/0x1c
[ 2716.476035] [] ? finish_task_switch+0x7d/0xa6
[ 2716.476035] [] ? do_notify_resume+0x20/0x5d
[ 2716.476035] [] ? retint_signal+0x3d/0x78
[ 2716.476035] ---[ end trace 827178d8a5966c3d ]---

Reported-by: Vince Weaver
Signed-off-by: Jiri Olsa
Cc: Corey Ashford
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc:
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1373384651-6109-1-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar

Jiri Olsa
2013-07-12 17:10:47 +0800

05 Jul, 2013

1 commit

e5302920d perf: Fix interrupt handler timing harness ... Browse Code »

This patch fixes a serious bug in:

14c63f17b1fd perf: Drop sample rate when sampling is too slow

There was an misunderstanding on the API of the do_div()
macro. It returns the remainder of the division and this
was not what the function expected leading to disabling the
interrupt latency watchdog.

This patch also remove a duplicate assignment in
perf_sample_event_took().

Signed-off-by: Stephane Eranian
Cc: peterz@infradead.org
Cc: dave.hansen@linux.intel.com
Cc: ak@linux.intel.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/20130704223010.GA30625@quad
Signed-off-by: Ingo Molnar

Stephane Eranian
2013-07-05 14:54:43 +0800

23 Jun, 2013

1 commit

14c63f17b perf: Drop sample rate when sampling is too slow ... Browse Code »

This patch keeps track of how long perf's NMI handler is taking,
and also calculates how many samples perf can take a second. If
the sample length times the expected max number of samples
exceeds a configurable threshold, it drops the sample rate.

This way, we don't have a runaway sampling process eating up the
CPU.

This patch can tend to drop the sample rate down to level where
perf doesn't work very well. *BUT* the alternative is that my
system hangs because it spends all of its time handling NMIs.

I'll take a busted performance tool over an entire system that's
busted and undebuggable any day.

BTW, my suspicion is that there's still an underlying bug here.
Using the HPET instead of the TSC is definitely a contributing
factor, but I suspect there are some other things going on.
But, I can't go dig down on a bug like that with my machine
hanging all the time.

Signed-off-by: Dave Hansen
Acked-by: Peter Zijlstra
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen
[ Prettified it a bit. ]
Signed-off-by: Ingo Molnar

Dave Hansen
2013-06-23 17:52:57 +0800

20 Jun, 2013

8 commits

bde96030f hw_breakpoint: Introduce "struct bp_cpuinfo" ... Browse Code »

This patch simply moves all per-cpu variables into the new
single per-cpu "struct bp_cpuinfo".

To me this looks more logical and clean, but this can also
simplify the further potential changes. In particular, I do not
think this memory should be per-cpu, it is never used "locally".
After this change it is trivial to turn it into, say,
bootmem[nr_cpu_ids].

Reported-by: Vince Weaver
Signed-off-by: Oleg Nesterov
Acked-by: Frederic Weisbecker
Link: http://lkml.kernel.org/r/20130620155020.GA6350@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2013-06-20 23:58:57 +0800
e12cbc10c hw_breakpoint: Simplify *register_wide_hw_breakpoint() ... Browse Code »

1. register_wide_hw_breakpoint() can use unregister_ if failure,
no need to duplicate the code.

2. "struct perf_event **pevent" adds the unnecesary lever of
indirection and complication, use per_cpu(*cpu_events, cpu).

Reported-by: Vince Weaver
Signed-off-by: Oleg Nesterov
Acked-by: Frederic Weisbecker
Link: http://lkml.kernel.org/r/20130620155018.GA6347@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2013-06-20 23:58:57 +0800
1c10adbb9 hw_breakpoint: Introduce cpumask_of_bp() ... Browse Code »

Add the trivial helper which simply returns cpumask_of() or
cpu_possible_mask depending on bp->cpu.

Change fetch_bp_busy_slots() and toggle_bp_slot() to always do
for_each_cpu(cpumask_of_bp) to simplify the code and avoid the
code duplication.

Reported-by: Vince Weaver
Signed-off-by: Oleg Nesterov
Acked-by: Frederic Weisbecker
Link: http://lkml.kernel.org/r/20130620155015.GA6340@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2013-06-20 23:58:56 +0800
7ab71f324 hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths ... Browse Code »

Change toggle_bp_slot() to make "weight" negative if !enable.
This way we can always use "+ weight" without additional "if
(enable)" check and toggle_bp_task_slot() no longer needs this
arg.

Reported-by: Vince Weaver
Signed-off-by: Oleg Nesterov
Acked-by: Frederic Weisbecker
Link: http://lkml.kernel.org/r/20130620155013.GA6337@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2013-06-20 23:58:55 +0800
e1ebe8620 hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths ... Browse Code »

The enable/disable logic in toggle_bp_slot() is not symmetrical
and imho very confusing. "old_count" in toggle_bp_task_slot() is
actually new_count because this bp was already removed from the
list.

Change toggle_bp_slot() to always call list_add/list_del after
toggle_bp_task_slot(). This way old_idx is task_bp_pinned() and
this entry should be decremented, new_idx is +/-weight and we
need to increment this element. The code/logic looks obvious.

Reported-by: Vince Weaver
Signed-off-by: Oleg Nesterov
Acked-by: Frederic Weisbecker
Link: http://lkml.kernel.org/r/20130620155011.GA6330@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2013-06-20 23:58:55 +0800
f070a4dba Merge branch 'perf/urgent' into perf/core ... Browse Code »

Merge in two hw_breakpoint fixes, before applying another 5.

Signed-off-by: Ingo Molnar

Ingo Molnar
2013-06-20 23:57:40 +0800
c790b0ad2 hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot() ... Browse Code »

fetch_bp_busy_slots() and toggle_bp_slot() use
for_each_online_cpu(), this is obviously wrong wrt cpu_up() or
cpu_down(), we can over/under account the per-cpu numbers.

For example:

# echo 0 >> /sys/devices/system/cpu/cpu1/online
# perf record -e mem:0x10 -p 1 &
# echo 1 >> /sys/devices/system/cpu/cpu1/online
# perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10 -C1 -a &
# taskset -p 0x2 1

triggers the same WARN_ONCE("Can't find any breakpoint slot") in
arch_install_hw_breakpoint().

Reported-by: Vince Weaver
Signed-off-by: Oleg Nesterov
Acked-by: Frederic Weisbecker
Cc:
Link: http://lkml.kernel.org/r/20130620155009.GA6327@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2013-06-20 23:57:01 +0800
8b4d801b2 hw_breakpoint: Fix cpu check in task_bp_pinned(cpu) ... Browse Code »

trinity fuzzer triggered WARN_ONCE("Can't find any breakpoint
slot") in arch_install_hw_breakpoint() but the problem is not
arch-specific.

The problem is, task_bp_pinned(cpu) checks "cpu == iter->cpu"
but this doesn't account the "all cpus" events with iter->cpu <
0.

This means that, say, register_user_hw_breakpoint(tsk) can
happily create the arbitrary number > HBP_NUM of breakpoints
which can not be activated. toggle_bp_task_slot() is equally
wrong by the same reason and nr_task_bp_pinned[] can have
negative entries.

Simple test:

# perl -e 'sleep 1 while 1' &
# perf record -e mem:0x10,mem:0x10,mem:0x10,mem:0x10,mem:0x10 -p `pidof perl`

Before this patch this triggers the same problem/WARN_ON(),
after the patch it correctly fails with -ENOSPC.

Reported-by: Vince Weaver
Signed-off-by: Oleg Nesterov
Acked-by: Frederic Weisbecker
Cc:
Link: http://lkml.kernel.org/r/20130620155006.GA6324@redhat.com
Signed-off-by: Ingo Molnar

Oleg Nesterov
2013-06-20 23:57:00 +0800

19 Jun, 2013

4 commits

03d8e80be perf: Add const qualifier to perf_pmu_register's 'name' arg ... Browse Code »

This allows us to use pdev->name for registering a PMU device.
IMO the name is not supposed to be changed anyway.

Signed-off-by: Mischa Jonker
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1370339148-5566-1-git-send-email-mjonker@synopsys.com
Signed-off-by: Ingo Molnar

Mischa Jonker
2013-06-19 18:50:23 +0800
e712209a9 perf: Fix hypervisor branch sampling permission check ... Browse Code »

Commit 2b923c8 perf/x86: Check branch sampling priv level in generic code
was missing the check for the hypervisor (HV) priv level, so add it back.

With this patch, we get the following correct behavior:

# echo 2 >/proc/sys/kernel/perf_event_paranoid

$ perf record -j any,k noploop 1
Error:
You may not have permission to collect stats.
Consider tweaking /proc/sys/kernel/perf_event_paranoid:
-1 - Not paranoid at all
0 - Disallow raw tracepoint access for unpriv
1 - Disallow cpu events for unpriv
2 - Disallow kernel profiling for unpriv

$ perf record -j any,hv noploop 1
Error:
You may not have permission to collect stats.
Consider tweaking /proc/sys/kernel/perf_event_paranoid:
-1 - Not paranoid at all
0 - Disallow raw tracepoint access for unpriv
1 - Disallow cpu events for unpriv
2 - Disallow kernel profiling for unpriv

Signed-off-by: Stephane Eranian
Acked-by: Petr Matousek
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20130606090204.GA3725@quad
Signed-off-by: Ingo Molnar

Stephane Eranian
2013-06-19 18:50:21 +0800
eff2108f0 Merge branch 'perf/urgent' into perf/core ... Browse Code »

Merge in the latest fixes, to avoid conflicts with ongoing work.

Signed-off-by: Ingo Molnar

Ingo Molnar
2013-06-19 18:44:41 +0800
9bb5d40cd perf: Fix mmap() accounting hole ... Browse Code »

Vince's fuzzer once again found holes. This time it spotted a leak in
the locked page accounting.

When an event had redirected output and its close() was the last
reference to the buffer we didn't have a vm context to undo accounting.

Change the code to destroy the buffer on the last munmap() and detach
all redirected events at that time. This provides us the right context
to undo the vm accounting.

Reported-and-tested-by: Vince Weaver
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20130604084421.GI8923@twins.programming.kicks-ass.net
Cc:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2013-06-19 18:44:13 +0800

28 May, 2013

5 commits

26cb63ad1 perf: Fix perf mmap bugs ... Browse Code »

Vince reported a problem found by his perf specific trinity
fuzzer.

Al noticed 2 problems with perf's mmap():

- it has issues against fork() since we use vma->vm_mm for accounting.
- it has an rb refcount leak on double mmap().

We fix the issues against fork() by using VM_DONTCOPY; I don't
think there's code out there that uses this; we didn't hear
about weird accounting problems/crashes. If we do need this to
work, the previously proposed VM_PINNED could make this work.

Aside from the rb reference leak spotted by Al, Vince's example
prog was indeed doing a double mmap() through the use of
perf_event_set_output().

This exposes another problem, since we now have 2 events with
one buffer, the accounting gets screwy because we account per
event. Fix this by making the buffer responsible for its own
accounting.

Reported-by: Vince Weaver
Signed-off-by: Peter Zijlstra
Cc: Al Viro
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Link: http://lkml.kernel.org/r/20130528085548.GA12193@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2013-05-28 17:05:08 +0800
2b923c8f5 perf/x86: Check branch sampling priv level in generic code ... Browse Code »

This patch moves commit 7cc23cd to the generic code:

perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL

The check is now implemented in generic code instead of x86 specific
code. That way we do not have to repeat the test in each arch
supporting branch sampling.

Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Cc: Frederic Weisbecker
Cc: Jiri Olsa
Cc: Arnaldo Carvalho de Melo
Link: http://lkml.kernel.org/r/20130521105337.GA2879@quad
Signed-off-by: Ingo Molnar

Stephane Eranian
2013-05-28 15:13:54 +0800
62b856397 perf: Add sysfs entry to adjust multiplexing interval per PMU ... Browse Code »

This patch adds /sys/device/xxx/perf_event_mux_interval_ms to ajust
the multiplexing interval per PMU. The unit is milliseconds. Value has
to be >= 1.

In the 4th version, we renamed the sysfs file to be more consistent
with the other /proc/sys/kernel entries for perf_events.

In the 5th version, we handle the reprogramming of the hrtimer using
hrtimer_forward_now(). That way, we sync up to new timer value quickly
(suggested by Jiri Olsa).

Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Cc: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Link: http://lkml.kernel.org/r/1364991694-5876-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar

Stephane Eranian
2013-05-28 15:13:51 +0800
9e6302056 perf: Use hrtimers for event multiplexing ... Browse Code »

The current scheme of using the timer tick was fine for per-thread
events. However, it was causing bias issues in system-wide mode
(including for uncore PMUs). Event groups would not get their fair
share of runtime on the PMU. With tickless kernels, if a core is idle
there is no timer tick, and thus no event rotation (multiplexing).
However, there are events (especially uncore events) which do count
even though cores are asleep.

This patch changes the timer source for multiplexing. It introduces a
per-PMU per-cpu hrtimer. The advantage is that even when a core goes
idle, it will come back to service the hrtimer, thus multiplexing on
system-wide events works much better.

The per-PMU implementation (suggested by PeterZ) enables adjusting the
multiplexing interval per PMU. The preferred interval is stashed into
the struct pmu. If not set, it will be forced to the default interval
value.

In order to minimize the impact of the hrtimer, it is turned on and
off on demand. When the PMU on a CPU is overcommited, the hrtimer is
activated. It is stopped when the PMU is not overcommitted.

In order for this to work properly, we had to change the order of
initialization in start_kernel() such that hrtimer_init() is run
before perf_event_init().

The default interval in milliseconds is set to a timer tick just like
with the old code. We will provide a sysctl to tune this in another
patch.

Signed-off-by: Stephane Eranian
Signed-off-by: Peter Zijlstra
Cc: Frederic Weisbecker
Cc: Arnaldo Carvalho de Melo
Link: http://lkml.kernel.org/r/1364991694-5876-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar

Stephane Eranian
2013-05-28 15:07:10 +0800
ab573844e perf: Fix hw breakpoints overflow period sampling ... Browse Code »

The hw breakpoint pmu 'add' function is missing the
period_left update needed for SW events.

The perf HW breakpoint events use the SW events framework
to process the overflow, so it needs to be properly initialized
in the PMU 'add' method.

Signed-off-by: Jiri Olsa
Reviewed-by: Peter Zijlstra
Cc: H. Peter Anvin
Cc: Oleg Nesterov
Cc: Arnaldo Carvalho de Melo
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Corey Ashford
Cc: Frederic Weisbecker
Cc: Vince Weaver
Cc: Stephane Eranian
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1367421944-19082-5-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar

Jiri Olsa
2013-05-28 14:59:54 +0800

07 May, 2013

2 commits

52d857a87 perf: Factor out auxiliary events notification ... Browse Code »

Add perf_event_aux() function to send out all types of
auxiliary events - mmap, task, comm events. For each type
there's match and output functions defined and used as
callbacks during perf_event_aux processing.

This way we can centralize the pmu/context iterating and
event matching logic. Also since lot of the code was
duplicated, this patch reduces the .text size about 2kB
on my setup:

snipped output from 'objdump -x kernel/events/core.o'

before:
Idx Name Size
0 .text 0000d313

after:
Idx Name Size
0 .text 0000cad3

Signed-off-by: Jiri Olsa
Acked-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Namhyung Kim
Cc: Corey Ashford
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Borislav Petkov
Link: http://lkml.kernel.org/r/1367857638-27631-3-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar

Jiri Olsa
2013-05-07 19:17:29 +0800
524eff183 perf: Fix EXIT event notification ... Browse Code »

The perf_event_task_ctx() function needs to be called with
preemption disabled, since it's checking for currently
scheduled cpu against event cpu.

We disable preemption for task related perf event context
if there's one defined, leaving up to the chance which cpu
it gets scheduled in.

Signed-off-by: Jiri Olsa
Acked-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Namhyung Kim
Cc: Corey Ashford
Cc: Frederic Weisbecker
Cc: Paul Mackerras
Cc: Stephane Eranian
Cc: Borislav Petkov
Link: http://lkml.kernel.org/r/1367857638-27631-2-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar

Jiri Olsa
2013-05-07 19:17:28 +0800

06 May, 2013

2 commits

534c97b09 Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull 'full dynticks' support from Ingo Molnar:
"This tree from Frederic Weisbecker adds a new, (exciting! :-) core
kernel feature to the timer and scheduler subsystems: 'full dynticks',
or CONFIG_NO_HZ_FULL=y.

This feature extends the nohz variable-size timer tick feature from
idle to busy CPUs (running at most one task) as well, potentially
reducing the number of timer interrupts significantly.

This feature got motivated by real-time folks and the -rt tree, but
the general utility and motivation of full-dynticks runs wider than
that:

- HPC workloads get faster: CPUs running a single task should be able
to utilize a maximum amount of CPU power. A periodic timer tick at
HZ=1000 can cause a constant overhead of up to 1.0%. This feature
removes that overhead - and speeds up the system by 0.5%-1.0% on
typical distro configs even on modern systems.

- Real-time workload latency reduction: CPUs running critical tasks
should experience as little jitter as possible. The last remaining
source of kernel-related jitter was the periodic timer tick.

- A single task executing on a CPU is a pretty common situation,
especially with an increasing number of cores/CPUs, so this feature
helps desktop and mobile workloads as well.

The cost of the feature is mainly related to increased timer
reprogramming overhead when a CPU switches its tick period, and thus
slightly longer to-idle and from-idle latency.

Configuration-wise a third mode of operation is added to the existing
two NOHZ kconfig modes:

- CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
as a config option. This is the traditional Linux periodic tick
design: there's a HZ tick going on all the time, regardless of
whether a CPU is idle or not.

- CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
periodic tick when a CPU enters idle mode.

- CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
tick when a CPU is idle, also slows the tick down to 1 Hz (one
timer interrupt per second) when only a single task is running on a
CPU.

The .config behavior is compatible: existing !CONFIG_NO_HZ and
CONFIG_NO_HZ=y settings get translated to the new values, without the
user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
default.

This feature is based on a lot of infrastructure work that has been
steadily going upstream in the last 2-3 cycles: related RCU support
and non-periodic cputime support in particular is upstream already.

This tree adds the final pieces and activates the feature. The pull
request is marked RFC because:

- it's marked 64-bit only at the moment - the 32-bit support patch is
small but did not get ready in time.

- it has a number of fresh commits that came in after the merge
window. The overwhelming majority of commits are from before the
merge window, but still some aspects of the tree are fresh and so I
marked it RFC.

- it's a pretty wide-reaching feature with lots of effects - and
while the components have been in testing for some time, the full
combination is still not very widely used. That it's default-off
should reduce its regression abilities and obviously there are no
known regressions with CONFIG_NO_HZ_FULL=y enabled either.

- the feature is not completely idempotent: there is no 100%
equivalent replacement for a periodic scheduler/timer tick. In
particular there's ongoing work to map out and reduce its effects
on scheduler load-balancing and statistics. This should not impact
correctness though, there are no known regressions related to this
feature at this point.

- it's a pretty ambitious feature that with time will likely be
enabled by most Linux distros, and we'd like you to make input on
its design/implementation, if you dislike some aspect we missed.
Without flaming us to crisp! :-)

Future plans:

- there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
the periodic tick altogether when there's a single busy task on a
CPU. We'd first like 1 Hz to be exposed more widely before we go
for the 0 Hz target though.

- once we reach 0 Hz we can remove the periodic tick assumption from
nr_running>=2 as well, by essentially interrupting busy tasks only
as frequently as the sched_latency constraints require us to do -
once every 4-40 msecs, depending on nr_running.

I am personally leaning towards biting the bullet and doing this in
v3.10, like the -rt tree this effort has been going on for too long -
but the final word is up to you as usual.

More technical details can be found in Documentation/timers/NO_HZ.txt"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
sched: Keep at least 1 tick per second for active dynticks tasks
rcu: Fix full dynticks' dependency on wide RCU nocb mode
nohz: Protect smp_processor_id() in tick_nohz_task_switch()
nohz_full: Add documentation.
cputime_nsecs: use math64.h for nsec resolution conversion helpers
nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
nohz: Reduce overhead under high-freq idling patterns
nohz: Remove full dynticks' superfluous dependency on RCU tree
nohz: Fix unavailable tick_stop tracepoint in dynticks idle
nohz: Add basic tracing
nohz: Select wide RCU nocb for full dynticks
nohz: Disable the tick when irq resume in full dynticks CPU
nohz: Re-evaluate the tick for the new task after a context switch
nohz: Prepare to stop the tick on irq exit
nohz: Implement full dynticks kick
nohz: Re-evaluate the tick from the scheduler IPI
sched: New helper to prevent from stopping the tick in full dynticks
sched: Kick full dynticks CPU that have more than one task enqueued.
perf: New helper to prevent full dynticks CPUs from stopping tick
perf: Kick full dynticks CPU if events rotation is needed
...

Linus Torvalds
2013-05-06 04:23:27 +0800
64049d197 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Misc fixes plus a small hw-enablement patch for Intel IB model 58
uncore events"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL
perf/x86/intel/lbr: Fix LBR filter
perf/x86: Blacklist all MEM_*_RETIRED events for Ivy Bridge
perf: Fix vmalloc ring buffer pages handling
perf/x86/intel: Fix unintended variable name reuse
perf/x86/intel: Add support for IvyBridge model 58 Uncore
perf/x86/intel: Fix typo in perf_event_intel_uncore.c
x86: Eliminate irq_mis_count counted in arch_irq_stat

Linus Torvalds
2013-05-06 02:37:16 +0800

02 May, 2013

1 commit

c032862fb Merge commit '8700c95adb03 ' into timers/nohz ... Browse Code »

The full dynticks tree needs the latest RCU and sched
upstream updates in order to fix some dependencies.

Merge a common upstream merge point that has these
updates.

Conflicts:
include/linux/perf_event.h
kernel/rcutree.h
kernel/rcutree_plugin.h

Signed-off-by: Frederic Weisbecker

Frederic Weisbecker
2013-05-02 23:54:19 +0800

01 May, 2013

1 commit

5919b3093 perf: Fix vmalloc ring buffer pages handling ... Browse Code »

If we allocate perf ring buffer with the size of single (user)
page, we will get memory corruption when releasing itin
rb_free_work function (for CONFIG_PERF_USE_VMALLOC option).

For single page sized ring buffer the page_order is -1 (because
nr_pages is 0). This needs to be recognized in the rb_free_work
function to release proper amount of pages.

Adding data_page_nr function that returns number of allocated
data pages. Customizing the rest of the code to use it.

Reported-by: Jan Stancek
Original-patch-by: Peter Zijlstra
Acked-by: Peter Zijlstra
Cc: Corey Ashford
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Namhyung Kim
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Signed-off-by: Jiri Olsa
Link: http://lkml.kernel.org/r/20130319143509.GA1128@krava.brq.redhat.com
Signed-off-by: Ingo Molnar

Jiri Olsa
2013-05-01 18:34:46 +0800

30 Apr, 2013

2 commits

e0972916e Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf updates from Ingo Molnar:
"Features:

- Add "uretprobes" - an optimization to uprobes, like kretprobes are
an optimization to kprobes. "perf probe -x file sym%return" now
works like kretprobes. By Oleg Nesterov.

- Introduce per core aggregation in 'perf stat', from Stephane
Eranian.

- Add memory profiling via PEBS, from Stephane Eranian.

- Event group view for 'annotate' in --stdio, --tui and --gtk, from
Namhyung Kim.

- Add support for AMD NB and L2I "uncore" counters, by Jacob Shin.

- Add Ivy Bridge-EP uncore support, by Zheng Yan

- IBM zEnterprise EC12 oprofile support patchlet from Robert Richter.

- Add perf test entries for checking breakpoint overflow signal
handler issues, from Jiri Olsa.

- Add perf test entry for for checking number of EXIT events, from
Namhyung Kim.

- Add perf test entries for checking --cpu in record and stat, from
Jiri Olsa.

- Introduce perf stat --repeat forever, from Frederik Deweerdt.

- Add --no-demangle to report/top, from Namhyung Kim.

- PowerPC fixes plus a couple of cleanups/optimizations in uprobes
and trace_uprobes, by Oleg Nesterov.

Various fixes and refactorings:

- Fix dependency of the python binding wrt libtraceevent, from
Naohiro Aota.

- Simplify some perf_evlist methods and to allow 'stat' to share code
with 'record' and 'trace', by Arnaldo Carvalho de Melo.

- Remove dead code in related to libtraceevent integration, from
Namhyung Kim.

- Revert "perf sched: Handle PERF_RECORD_EXIT events" to get 'perf
sched lat' back working, by Arnaldo Carvalho de Melo

- We don't use Newt anymore, just plain libslang, by Arnaldo Carvalho
de Melo.

- Kill a bunch of die() calls, from Namhyung Kim.

- Fix build on non-glibc systems due to libio.h absence, from Cody P
Schafer.

- Remove some perf_session and tracing dead code, from David Ahern.

- Honor parallel jobs, fix from Borislav Petkov

- Introduce tools/lib/lk library, initially just removing duplication
among tools/perf and tools/vm. from Borislav Petkov

... and many more I missed to list, see the shortlog and git log for
more details."

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (136 commits)
perf/x86/intel/P4: Robistify P4 PMU types
perf/x86/amd: Fix AMD NB and L2I "uncore" support
perf/x86/amd: Remove old-style NB counter support from perf_event_amd.c
perf/x86: Check all MSRs before passing hw check
perf/x86/amd: Add support for AMD NB and L2I "uncore" counters
perf/x86/intel: Add Ivy Bridge-EP uncore support
perf/x86/intel: Fix SNB-EP CBO and PCU uncore PMU filter management
perf/x86: Avoid kfree() in CPU_{STARTING,DYING}
uprobes/perf: Avoid perf_trace_buf_prepare/submit if ->perf_events is empty
uprobes/tracing: Don't pass addr=ip to perf_trace_buf_submit()
uprobes/tracing: Change create_trace_uprobe() to support uretprobes
uprobes/tracing: Make seq_printf() code uretprobe-friendly
uprobes/tracing: Make register_uprobe_event() paths uretprobe-friendly
uprobes/tracing: Make uprobe_{trace,perf}_print() uretprobe-friendly
uprobes/tracing: Introduce is_ret_probe() and uretprobe_dispatcher()
uprobes/tracing: Introduce uprobe_{trace,perf}_print() helpers
uprobes/tracing: Generalize struct uprobe_trace_entry_head
uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls
uprobes/tracing: Kill the pointless seq_print_ip_sym() call
uprobes/tracing: Kill the pointless task_pt_regs() calls
...

Linus Torvalds
2013-04-30 22:41:01 +0800
191a71209 Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:

- Fixes and a lot of cleanups. Locking cleanup is finally complete.
cgroup_mutex is no longer exposed to individual controlelrs which
used to cause nasty deadlock issues. Li fixed and cleaned up quite a
bit including long standing ones like racy cgroup_path().

- device cgroup now supports proper hierarchy thanks to Aristeu.

- perf_event cgroup now supports proper hierarchy.

- A new mount option "__DEVEL__sane_behavior" is added. As indicated
by the name, this option is to be used for development only at this
point and generates a warning message when used. Unfortunately,
cgroup interface currently has too many brekages and inconsistencies
to implement a consistent and unified hierarchy on top. The new flag
is used to collect the behavior changes which are necessary to
implement consistent unified hierarchy. It's likely that this flag
won't be used verbatim when it becomes ready but will be enabled
implicitly along with unified hierarchy.

The option currently disables some of broken behaviors in cgroup core
and also .use_hierarchy switch in memcg (will be routed through -mm),
which can be used to make very unusual hierarchy where nesting is
partially honored. It will also be used to implement hierarchy
support for blk-throttle which would be impossible otherwise without
introducing a full separate set of control knobs.

This is essentially versioning of interface which isn't very nice but
at this point I can't see any other options which would allow keeping
the interface the same while moving towards hierarchy behavior which
is at least somewhat sane. The planned unified hierarchy is likely
to require some level of adaptation from userland anyway, so I think
it'd be best to take the chance and update the interface such that
it's supportable in the long term.

Maintaining the existing interface does complicate cgroup core but
shouldn't put too much strain on individual controllers and I think
it'd be manageable for the foreseeable future. Maybe we'll be able
to drop it in a decade.

Fix up conflicts (including a semantic one adding a new #include to ppc
that was uncovered by header the file changes) as per Tejun.

* 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (45 commits)
cpuset: fix compile warning when CONFIG_SMP=n
cpuset: fix cpu hotplug vs rebuild_sched_domains() race
cpuset: use rebuild_sched_domains() in cpuset_hotplug_workfn()
cgroup: restore the call to eventfd->poll()
cgroup: fix use-after-free when umounting cgroupfs
cgroup: fix broken file xattrs
devcg: remove parent_cgroup.
memcg: force use_hierarchy if sane_behavior
cgroup: remove cgrp->top_cgroup
cgroup: introduce sane_behavior mount option
move cgroupfs_root to include/linux/cgroup.h
cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix
cgroup: make cgroup_path() not print double slashes
Revert "cgroup: remove bind() method from cgroup_subsys."
perf: make perf_event cgroup hierarchical
cgroup: implement cgroup_is_descendant()
cgroup: make sure parent won't be destroyed before its children
cgroup: remove bind() method from cgroup_subsys.
devcg: remove broken_hierarchy tag
cgroup: remove cgroup_lock_is_held()
...

Linus Torvalds
2013-04-30 10:14:20 +0800

23 Apr, 2013

2 commits

026249ef1 perf: New helper to prevent full dynticks CPUs from stopping tick ... Browse Code »

Provide a new helper that help full dynticks CPUs to prevent
from stopping their tick in case there are events in the local
rotation list.

This way we make sure that perf_event_task_tick() is serviced
on demand.

Signed-off-by: Frederic Weisbecker
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Oleg Nesterov
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Stephane Eranian
Cc: Jiri Olsa

Frederic Weisbecker
2013-04-23 01:59:39 +0800
12351ef8f perf: Kick full dynticks CPU if events rotation is needed ... Browse Code »

Kick the current CPU's tick by sending it a self IPI when
an event is queued on the rotation list and it is the first
element inserted. This makes sure that perf_event_task_tick()
works on full dynticks CPUs.

Signed-off-by: Frederic Weisbecker
Cc: Chris Metcalf
Cc: Christoph Lameter
Cc: Geoff Levand
Cc: Gilad Ben Yossef
Cc: Hakan Akkan
Cc: Ingo Molnar
Cc: Kevin Hilman
Cc: Li Zhong
Cc: Oleg Nesterov
Cc: Paul E. McKenney
Cc: Paul Gortmaker
Cc: Peter Zijlstra
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Stephane Eranian
Cc: Jiri Olsa

Frederic Weisbecker
2013-04-23 01:59:37 +0800

21 Apr, 2013

2 commits

c79aa0d96 events: Protect access via task_subsys_state_check() ... Browse Code »

The following RCU splat indicates lack of RCU protection:

[ 953.267649] ===============================
[ 953.267652] [ INFO: suspicious RCU usage. ]
[ 953.267657] 3.9.0-0.rc6.git2.4.fc19.ppc64p7 #1 Not tainted
[ 953.267661] -------------------------------
[ 953.267664] include/linux/cgroup.h:534 suspicious rcu_dereference_check() usage!
[ 953.267669]
[ 953.267669] other info that might help us debug this:
[ 953.267669]
[ 953.267675]
[ 953.267675] rcu_scheduler_active = 1, debug_locks = 0
[ 953.267680] 1 lock held by glxgears/1289:
[ 953.267683] #0: (&sig->cred_guard_mutex){+.+.+.}, at: [] .prepare_bprm_creds+0x34/0xa0
[ 953.267700]
[ 953.267700] stack backtrace:
[ 953.267704] Call Trace:
[ 953.267709] [c0000001f0d1b6e0] [c000000000016e30] .show_stack+0x130/0x200 (unreliable)
[ 953.267717] [c0000001f0d1b7b0] [c0000000001267f8] .lockdep_rcu_suspicious+0x138/0x180
[ 953.267724] [c0000001f0d1b840] [c0000000001d43a4] .perf_event_comm+0x4c4/0x690
[ 953.267731] [c0000001f0d1b950] [c00000000027f6e4] .set_task_comm+0x84/0x1f0
[ 953.267737] [c0000001f0d1b9f0] [c000000000280414] .setup_new_exec+0x94/0x220
[ 953.267744] [c0000001f0d1ba70] [c0000000002f665c] .load_elf_binary+0x58c/0x19b0
...

This commit therefore adds the required RCU read-side critical
section to perf_event_comm().

Reported-by: Adam Jackson
Signed-off-by: Paul E. McKenney
Cc: a.p.zijlstra@chello.nl
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Link: http://lkml.kernel.org/r/20130419190124.GA8638@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar
Tested-by: Gustavo Luiz Duarte

Paul E. McKenney
2013-04-21 17:21:39 +0800
73e21ce28 Merge branch 'perf/urgent' into perf/core ... Browse Code »

Conflicts:
arch/x86/kernel/cpu/perf_event_intel.c

Merge in the latest fixes before applying new patches, resolve the conflict.

Signed-off-by: Ingo Molnar

Ingo Molnar
2013-04-21 16:57:33 +0800

15 Apr, 2013

1 commit

8176cced7 perf: Treat attr.config as u64 in perf_swevent_init() ... Browse Code »

Trinity discovered that we fail to check all 64 bits of
attr.config passed by user space, resulting to out-of-bounds
access of the perf_swevent_enabled array in
sw_perf_event_destroy().

Introduced in commit b0a873ebb ("perf: Register PMU
implementations").

Signed-off-by: Tommi Rantala
Cc: Peter Zijlstra
Cc: davej@redhat.com
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Link: http://lkml.kernel.org/r/1365882554-30259-1-git-send-email-tt.rantala@gmail.com
Signed-off-by: Ingo Molnar

Tommi Rantala
2013-04-15 17:42:12 +0800

13 Apr, 2013

2 commits

a0d60aef4 uretprobes: Remove -ENOSYS as return probes implemented ... Browse Code »

Enclose return probes implementation.

Signed-off-by: Anton Arapov
Acked-by: Srikar Dronamraju
Signed-off-by: Oleg Nesterov

Anton Arapov
2013-04-13 21:31:58 +0800
ded49c553 uretprobes: Limit the depth of return probe nestedness ... Browse Code »

Unlike the kretprobes we can't trust userspace, thus must have
protection from user space attacks. User-space have "unlimited"
stack, and this patch limits the return probes nestedness as a
simple remedy for it.

Note that this implementation leaks return_instance on siglongjmp
until exit()/exec().

The intention is to have KISS and bare minimum solution for the
initial implementation in order to not complicate the uretprobes
code.

In the future we may come up with more sophisticated solution that
remove this depth limitation. It is not easy task and lays beyond
this patchset.

Signed-off-by: Anton Arapov
Acked-by: Srikar Dronamraju
Signed-off-by: Oleg Nesterov

Anton Arapov
2013-04-13 21:31:58 +0800