Doug / smarc-fsl-linux-kernel | Embedian Git Server

13 Sep, 2010

1 commit

b0b2072df perf_events: Fix BTS interrupt handling to avoid being dazed by NMI (v2) ... Browse Code »

Fix a bug introduced with commit de725de and the change in the
meaning of the return value of intel_pmu_handle_irq(). With the
current code, when you are using the BTS, you get 'dazed by NMI'
each time the BTS buffer fills up.

BTS does interrupt on the PMU vector, thus NMI. You need to take
this into account in the return value of the function.

This version fixes initial patch which was missing changes to
perf_event_intel_ds.c.

Signed-off-by: Stephane Eranian
Acked-by: Don Zickus
Cc: peterz@infradead.org
Cc: paulus@samba.org
Cc: davem@davemloft.net
Cc: fweisbec@gmail.com
Cc: perfmon2-devel@lists.sf.net
Cc: eranian@gmail.com
Cc: robert.richter@amd.com
LKML-Reference:
Signed-off-by: Ingo Molnar

Stephane Eranian
2010-09-13 14:43:40 +0800

10 Sep, 2010

25 commits

e5f4d3394 perf: Fix perf_init_event() ... Browse Code »

We ought to return -ENOENT when non of the registered PMUs
recognise the requested event.

This fixes a boot crash that occurs if no PMU is available
but the NMI watchdog tries to register an event.

Reported-by: Ingo Molnar
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 23:41:55 +0800
cee010ec5 perf: Ensure we call add_event_to_ctx() with the right locks held ... Browse Code »

Even though we call it from the inherit path, where the child is
not yet accessible, we need to hold ctx->lock, add_event_to_ctx()
assumes IRQs are disabled.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 22:24:33 +0800
3b8fad3e2 irq: Fix circular headers dependency ... Browse Code »

asm-generic/hardirq.h needs asm/irq.h which might include
linux/interrupt.h as in the sparc 32 case. At this point
we need irq_cpustat generic definitions, but those are
included later in asm-generic/hardirq.h.

Then delay a bit the inclusion of irq.h from
asm-generic/hardirq.h, it doesn't need to be included early.

This fixes:

include/linux/interrupt.h: In function '__raise_softirq_irqoff':
include/linux/interrupt.h:414: error: implicit declaration of function 'local_softirq_pending'
include/linux/interrupt.h:414: error: lvalue required as left operand of assignment

Reported-by: Ingo Molnar
Signed-off-by: Frederic Weisbecker
Cc: Lai Jiangshan
Cc: Koki Sanagi
Cc: mathieu.desnoyers@efficios.com
Cc: rostedt@goodmis.org
Cc: nhorman@tuxdriver.com
Cc: scott.a.mcmillan@intel.com
Cc: eric.dumazet@gmail.com
Cc: kaneshige.kenji@jp.fujitsu.com
Cc: davem@davemloft.net
Cc: izumi.taku@jp.fujitsu.com
Cc: kosaki.motohiro@jp.fujitsu.com
LKML-Reference:
Signed-off-by: Ingo Molnar

Frederic Weisbecker
2010-09-10 03:28:58 +0800
4e231c796 perf: Fix up delayed_put_task_struct() ... Browse Code »

I missed a perf_event_ctxp user when converting it to an array. Pull this
last user into perf_event.c as well and fix it up.

Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 03:07:09 +0800
1b9a644fe perf: Optimize context ops ... Browse Code »

Assuming we don't mix events of different pmus onto a single context
(with the exeption of software events inside a hardware group) we can
now assume that all events on a particular context belong to the same
pmu, hence we can disable the pmu for the entire context operations.

This reduces the amount of hardware writes.

The exception for swevents comes from the fact that the sw pmu disable
is a nop.

Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Frederic Weisbecker
Cc: Lin Ming
Cc: Yanmin
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:34 +0800
89a1e1873 perf: Provide a separate task context for swevents ... Browse Code »

Since software events are always schedulable, mixing them up with
hardware events (who are not) can lead to funny scheduling oddities.

Giving them their own context solves this.

Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Frederic Weisbecker
Cc: Lin Ming
Cc: Yanmin
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:34 +0800
8dc85d547 perf: Multiple task contexts ... Browse Code »

Provide the infrastructure for multiple task contexts.

A more flexible approach would have resulted in more pointer chases
in the scheduling hot-paths. This approach has the limitation of a
static number of task contexts.

Since I expect most external PMUs to be system wide, or at least node
wide (as per the intel uncore unit) they won't actually need a task
context.

Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Frederic Weisbecker
Cc: Lin Ming
Cc: Yanmin
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:33 +0800
eb1844798 perf: Clean up perf_event_context allocation ... Browse Code »

Unify the two perf_event_context allocation sites.

Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Frederic Weisbecker
Cc: Lin Ming
Cc: Yanmin
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:33 +0800
97dee4f32 perf: Move some code around ... Browse Code »

Move all inherit code near each other.

Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Frederic Weisbecker
Cc: Lin Ming
Cc: Yanmin
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:33 +0800
108b02cfc perf: Per-pmu-per-cpu contexts ... Browse Code »

Allocate per-cpu contexts per pmu.

Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Frederic Weisbecker
Cc: Lin Ming
Cc: Yanmin
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:32 +0800
b5ab4cd56 perf: Per cpu-context rotation timer ... Browse Code »

Give each cpu-context its own timer so that it is a self contained
entity, this eases the way for per-pmu-per-cpu contexts as well as
provides the basic infrastructure to allow different rotation
times per pmu.

Things to look at:
- folding the tick and these TICK_NSEC timers
- separate task context rotation

Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Frederic Weisbecker
Cc: Lin Ming
Cc: Yanmin
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:32 +0800
b28ab83c5 perf: Remove the swevent hash-table from the cpu context ... Browse Code »

Separate the swevent hash-table from the cpu_context bits in
preparation for per pmu cpu contexts.

This keeps the swevent hash a global entity.

Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Frederic Weisbecker
Cc: Lin Ming
Cc: Yanmin
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:32 +0800
c3f00c702 perf: Separate find_get_context() from event initialization ... Browse Code »

Separate find_get_context() from the event allocation and
initialization so that we may make find_get_context() depend
on the event pmu in a later patch.

Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Frederic Weisbecker
Cc: Lin Ming
Cc: Yanmin
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:31 +0800
15ac9a395 perf: Remove the sysfs bits ... Browse Code »

Neither the overcommit nor the reservation sysfs parameter were
actually working, remove them as they'll only get in the way.

Signed-off-by: Peter Zijlstra
Cc: paulus
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:31 +0800
a4eaf7f14 perf: Rework the PMU methods ... Browse Code »

Replace pmu::{enable,disable,start,stop,unthrottle} with
pmu::{add,del,start,stop}, all of which take a flags argument.

The new interface extends the capability to stop a counter while
keeping it scheduled on the PMU. We replace the throttled state with
the generic stopped state.

This also allows us to efficiently stop/start counters over certain
code paths (like IRQ handlers).

It also allows scheduling a counter without it starting, allowing for
a generic frozen state (useful for rotating stopped counters).

The stopped state is implemented in two different ways, depending on
how the architecture implemented the throttled state:

1) We disable the counter:
a) the pmu has per-counter enable bits, we flip that
b) we program a NOP event, preserving the counter state

2) We store the counter state and ignore all read/overflow events

Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Will Deacon
Cc: Paul Mundt
Cc: Frederic Weisbecker
Cc: Cyrill Gorcunov
Cc: Lin Ming
Cc: Yanmin
Cc: Deng-Cheng Zhu
Cc: David Miller
Cc: Michael Cree
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:30 +0800
fa407f35e perf: Shrink hw_perf_event ... Browse Code »

Use hw_perf_event::period_left instead of hw_perf_event::remaining
and win back 8 bytes.

Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:30 +0800
ad5133b70 perf: Default PMU ops ... Browse Code »

Provide default implementations for the pmu txn methods, this
allows us to remove some conditional code.

Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Will Deacon
Cc: Paul Mundt
Cc: Frederic Weisbecker
Cc: Cyrill Gorcunov
Cc: Lin Ming
Cc: Yanmin
Cc: Deng-Cheng Zhu
Cc: David Miller
Cc: Michael Cree
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:30 +0800
33696fc0d perf: Per PMU disable ... Browse Code »

Changes perf_disable() into perf_pmu_disable().

Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Will Deacon
Cc: Paul Mundt
Cc: Frederic Weisbecker
Cc: Cyrill Gorcunov
Cc: Lin Ming
Cc: Yanmin
Cc: Deng-Cheng Zhu
Cc: David Miller
Cc: Michael Cree
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:29 +0800
24cd7f54a perf: Reduce perf_disable() usage ... Browse Code »

Since the current perf_disable() usage is only an optimization,
remove it for now. This eases the removal of the __weak
hw_perf_enable() interface.

Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Will Deacon
Cc: Paul Mundt
Cc: Frederic Weisbecker
Cc: Cyrill Gorcunov
Cc: Lin Ming
Cc: Yanmin
Cc: Deng-Cheng Zhu
Cc: David Miller
Cc: Michael Cree
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:29 +0800
9ed6060d2 perf: Unindent labels ... Browse Code »

Fixup random annoying style bits.

Signed-off-by: Peter Zijlstra
Cc: paulus
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:28 +0800
b0a873ebb perf: Register PMU implementations ... Browse Code »

Simple registration interface for struct pmu, this provides the
infrastructure for removing all the weak functions.

Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Will Deacon
Cc: Paul Mundt
Cc: Frederic Weisbecker
Cc: Cyrill Gorcunov
Cc: Lin Ming
Cc: Yanmin
Cc: Deng-Cheng Zhu
Cc: David Miller
Cc: Michael Cree
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:28 +0800
51b0fe395 perf: Deconstify struct pmu ... Browse Code »

sed -ie 's/const struct pmu\>/struct pmu/g' `git grep -l "const struct pmu\>"`

Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Will Deacon
Cc: Paul Mundt
Cc: Frederic Weisbecker
Cc: Cyrill Gorcunov
Cc: Lin Ming
Cc: Yanmin
Cc: Deng-Cheng Zhu
Cc: David Miller
Cc: Michael Cree
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:46:27 +0800
2aa61274e Merge branch 'perf/urgent' into perf/core ... Browse Code »

Merge reason: Pick up pending fixes before applying dependent new changes.

Signed-off-by: Ingo Molnar

Ingo Molnar
2010-09-10 02:40:08 +0800
5e11637e2 perf: Fix CPU hotplug ... Browse Code »

Since we have UP_PREPARE, we should also have UP_CANCELED.

Signed-off-by: Peter Zijlstra
Cc: paulus
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-10 02:38:52 +0800
9cb627d5f perf, trace: Fix module leak ... Browse Code »

Commit 1c024eca (perf, trace: Optimize tracepoints by using
per-tracepoint-per-cpu hlist to track events) caused a module
refcount leak.

Reported-And-Tested-by: Avi Kivity
Signed-off-by: Peter Zijlstra
LKML-Reference:
Signed-off-by: Ingo Molnar

Li Zefan
2010-09-10 02:38:51 +0800

09 Sep, 2010

1 commit

9c55cb12c tracing: Do not allow llseek to set_ftrace_filter ... Browse Code »

Reading the file set_ftrace_filter does three things.

1) shows whether or not filters are set for the function tracer
2) shows what functions are set for the function tracer
3) shows what triggers are set on any functions

3 is independent from 1 and 2.

The way this file currently works is that it is a state machine,
and as you read it, it may change state. But this assumption breaks
when you use lseek() on the file. The state machine gets out of sync
and the t_show() may use the wrong pointer and cause a kernel oops.

Luckily, this will only kill the app that does the lseek, but the app
dies while holding a mutex. This prevents anyone else from using the
set_ftrace_filter file (or any other function tracing file for that matter).

A real fix for this is to rewrite the code, but that is too much for
a -rc release or stable. This patch simply disables llseek on the
set_ftrace_filter() file for now, and we can do the proper fix for the
next major release.

Reported-by: Robert Swiecki
Cc: Chris Wright
Cc: Tavis Ormandy
Cc: Eugene Teo
Cc: vendor-sec@lst.de
Cc:
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-09-09 00:08:01 +0800

08 Sep, 2010

6 commits

da34634fd tracing/kprobe: Fix handling of C-unlike argument names ... Browse Code »

Check the argument name whether it is invalid (not C-like symbol name). This
makes event format simple.

Reported-by: Srikar Dronamraju
Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mathieu Desnoyers
LKML-Reference:
Signed-off-by: Masami Hiramatsu
Signed-off-by: Arnaldo Carvalho de Melo

Masami Hiramatsu
2010-09-08 22:47:19 +0800
aba91595c tracing/kprobes: Fix handling of argument names ... Browse Code »

Set "argN" name for each argument automatically if it has no specified name.
Since dynamic trace event(kprobe_events) accepts special characters for its
argument, its format can show those special characters (e.g. '$', '%', '+').
However, perf can't parse those format because of the character (especially
'%') mess up the format. This sets "argX" name for those arguments if user
omitted the argument names.

E.g.
# echo 'p do_fork %ax IP=%ip $stack' > tracing/kprobe_events
# cat tracing/kprobe_events
p:kprobes/p_do_fork_0 do_fork arg1=%ax IP=%ip arg3=$stack

Reported-by: Srikar Dronamraju
Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mathieu Desnoyers
LKML-Reference:
Signed-off-by: Masami Hiramatsu
Signed-off-by: Arnaldo Carvalho de Melo

Masami Hiramatsu
2010-09-08 22:47:19 +0800
367e94c10 perf probe: Fix handling of arguments names ... Browse Code »

Don't make argument names from raw parameters (means the parameters are written
in kprobe-tracer syntax), because the argument syntax may include special
characters. Just leave it, then kprobe-tracer gives a new name.

Reported-by: Srikar Dronamraju
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Masami Hiramatsu
Signed-off-by: Arnaldo Carvalho de Melo

Masami Hiramatsu
2010-09-08 22:47:19 +0800
04ddd04b0 perf probe: Fix return probe support ... Browse Code »

Fix a bug to support %return probe syntax again. Previous commit 4235b04 has a
bug which disables the %return syntax on perf probe.

Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Masami Hiramatsu
Signed-off-by: Arnaldo Carvalho de Melo

Masami Hiramatsu
2010-09-08 22:47:18 +0800
61a527362 tracing/kprobe: Fix a memory leak in error case ... Browse Code »

Fix a memory leak which happens when a field name conflicts with others. In
error case, free_trace_probe() will free all arguments until nr_args, so this
increments nr_args the begining of the loop instead of the end.

Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Mathieu Desnoyers
LKML-Reference:
Signed-off-by: Masami Hiramatsu
Signed-off-by: Arnaldo Carvalho de Melo

Masami Hiramatsu
2010-09-08 22:47:18 +0800
359d5106a perf: Add a script to show packets processing ... Browse Code »

Add a perf script which shows packets processing and processed
time. It helps us to investigate networking or network devices.

If you want to use it, install perf and record perf.data like
following.

If you set script, perf gathers records until it ends.
If not, you must Ctrl-C to stop recording.

And if you want a report from record,

If you use some options, you can limit the output.
Option is below.

tx: show only tx packets processing
rx: show only rx packets processing
dev=: show processing on this device
debug: work with debug mode. It shows buffer status.

For example, if you want to show received packets processing
associated with eth4,

106133.171439sec cpu=0
irq_entry(+0.000msec irq=24:eth4)
|
softirq_entry(+0.006msec)
|
|---netif_receive_skb(+0.010msec skb=f2d15900 len=100)
| |
| skb_copy_datagram_iovec(+0.039msec 10291::10291)
|
napi_poll_exit(+0.022msec eth4)

This perf script helps us to analyze the processing time of a
transmit/receive sequence.

Signed-off-by: Koki Sanagi
Acked-by: David S. Miller
Cc: Neil Horman
Cc: Mathieu Desnoyers
Cc: Kaneshige Kenji
Cc: Izumo Taku
Cc: Kosaki Motohiro
Cc: Lai Jiangshan
Cc: Scott Mcmillan
Cc: Steven Rostedt
Cc: Eric Dumazet
Cc: Tom Zanussi
LKML-Reference:
Signed-off-by: Frederic Weisbecker

Koki Sanagi
2010-09-08 00:43:32 +0800

07 Sep, 2010

4 commits

07dc22e72 skb: Add tracepoints to freeing skb ... Browse Code »

This patch adds tracepoint to consume_skb and add trace_kfree_skb
before __kfree_skb in skb_free_datagram_locked and net_tx_action.
Combinating with tracepoint on dev_hard_start_xmit, we can check
how long it takes to free transmitted packets. And using it, we can
calculate how many packets driver had at that time. It is useful when
a drop of transmitted packet is a problem.

sshd-6828 [000] 112689.258154: consume_skb: skbaddr=f2d99bb8

Signed-off-by: Koki Sanagi
Acked-by: David S. Miller
Acked-by: Neil Horman
Cc: Mathieu Desnoyers
Cc: Kaneshige Kenji
Cc: Izumo Taku
Cc: Kosaki Motohiro
Cc: Lai Jiangshan
Cc: Scott Mcmillan
Cc: Steven Rostedt
Cc: Eric Dumazet
LKML-Reference:
Signed-off-by: Frederic Weisbecker

Koki Sanagi
2010-09-07 23:51:53 +0800
cf66ba58b netdev: Add tracepoints to netdev layer ... Browse Code »

This patch adds tracepoint to dev_queue_xmit, dev_hard_start_xmit,
netif_rx and netif_receive_skb. These tracepoints help you to monitor
network driver's input/output.

-0 [001] 112447.902030: netif_rx: dev=eth1 skbaddr=f3ef0900 len=84
-0 [001] 112447.902039: netif_receive_skb: dev=eth1 skbaddr=f3ef0900 len=84
sshd-6828 [000] 112447.903257: net_dev_queue: dev=eth4 skbaddr=f3fca538 len=226
sshd-6828 [000] 112447.903260: net_dev_xmit: dev=eth4 skbaddr=f3fca538 len=226 rc=0

Signed-off-by: Koki Sanagi
Acked-by: David S. Miller
Acked-by: Neil Horman
Cc: Mathieu Desnoyers
Cc: Kaneshige Kenji
Cc: Izumo Taku
Cc: Kosaki Motohiro
Cc: Lai Jiangshan
Cc: Scott Mcmillan
Cc: Steven Rostedt
Cc: Eric Dumazet
LKML-Reference:
Signed-off-by: Frederic Weisbecker

Koki Sanagi
2010-09-07 23:51:33 +0800
3e4b10d7a napi: Convert trace_napi_poll to TRACE_EVENT ... Browse Code »

This patch converts trace_napi_poll from DECLARE_EVENT to TRACE_EVENT
to improve the usability of napi_poll tracepoint.

-0 [001] 241302.750777: napi_poll: napi poll on napi struct f6acc480 for device eth3
-0 [000] 241302.852389: napi_poll: napi poll on napi struct f5d0d70c for device eth1

The original patch is below:
http://marc.info/?l=linux-kernel&m=126021713809450&w=2

[ sanagi.koki@jp.fujitsu.com: And add a fix by Steven Rostedt:
http://marc.info/?l=linux-kernel&m=126150506519173&w=2 ]

Signed-off-by: Neil Horman
Acked-by: David S. Miller
Acked-by: Neil Horman
Cc: Mathieu Desnoyers
Cc: Kaneshige Kenji
Cc: Izumo Taku
Cc: Kosaki Motohiro
Cc: Lai Jiangshan
Cc: Scott Mcmillan
Cc: Steven Rostedt
Cc: Eric Dumazet
LKML-Reference:
Signed-off-by: Koki Sanagi
Signed-off-by: Frederic Weisbecker

Neil Horman
2010-09-07 23:51:01 +0800
2bf2160d8 irq: Add tracepoint to softirq_raise ... Browse Code »

Add a tracepoint for tracing when softirq action is raised.

This and the existing tracepoints complete softirq's tracepoints:
softirq_raise, softirq_entry and softirq_exit.

And when this tracepoint is used in combination with
the softirq_entry tracepoint we can determine
the softirq raise latency.

Signed-off-by: Lai Jiangshan
Acked-by: Mathieu Desnoyers
Acked-by: Neil Horman
Cc: David Miller
Cc: Kaneshige Kenji
Cc: Izumo Taku
Cc: Kosaki Motohiro
Cc: Lai Jiangshan
Cc: Scott Mcmillan
Cc: Steven Rostedt
Cc: Eric Dumazet
LKML-Reference:
[ factorize softirq events with DECLARE_EVENT_CLASS ]
Signed-off-by: Koki Sanagi
Signed-off-by: Frederic Weisbecker

Lai Jiangshan
2010-09-07 23:49:34 +0800

03 Sep, 2010

3 commits

4177c42a6 perf, x86: Try to handle unknown nmis with an enabled PMU ... Browse Code »

When the PMU is enabled it is valid to have unhandled nmis, two
events could trigger 'simultaneously' raising two back-to-back
NMIs. If the first NMI handles both, the latter will be empty
and daze the CPU.

The solution to avoid an 'unknown nmi' massage in this case was
simply to stop the nmi handler chain when the PMU is enabled by
stating the nmi was handled. This has the drawback that a) we
can not detect unknown nmis anymore, and b) subsequent nmi
handlers are not called.

This patch addresses this. Now, we check this unknown NMI if it
could be a PMU back-to-back NMI. Otherwise we pass it and let
the kernel handle the unknown nmi.

This is a debug log:

cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430
cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616
cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320
cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139
cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100
cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607
cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416
cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032
cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830
cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743
cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552
cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224
cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677
cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772
cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818
cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591
Uhhuh. NMI received for unknown reason 00 on CPU 6.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue

Deltas:

nmi #32334 340186
nmi #32335 1327704
nmi #32336 1819 <<<< back-to-back nmi [1]
nmi #32337 85961
nmi #32338 284507
nmi #32339 1578809
nmi #32340 217616
nmi #32341 1798 <<<< back-to-back nmi [2]
nmi #32342 240913
nmi #32343 1512809
nmi #32344 116672
nmi #32345 412453
nmi #32346 1462095 <<<< 1st nmi (standard) handling 2 counters
nmi #32347 2046 <<<< 2nd nmi (back-to-back) handling one
counter nmi #32348 1773 <<<< 3rd nmi (back-to-back)
handling no counter! [3]

For back-to-back nmi detection there are the following rules:

The PMU nmi handler was handling more than one counter and no
counter was handled in the subsequent nmi (see [1] and [2]
above).

There is another case if there are two subsequent back-to-back
nmis [3]. The 2nd is detected as back-to-back because the first
handled more than one counter. If the second handles one counter
and the 3rd handles nothing, we drop the 3rd nmi because it
could be a back-to-back nmi.

Signed-off-by: Robert Richter
Signed-off-by: Peter Zijlstra
[ renamed nmi variable to pmu_nmi to avoid clash with .nmi in entry.S ]
Signed-off-by: Don Zickus
Cc: peterz@infradead.org
Cc: gorcunov@gmail.com
Cc: fweisbec@gmail.com
Cc: ying.huang@intel.com
Cc: ming.m.lin@intel.com
Cc: eranian@google.com
LKML-Reference:
Signed-off-by: Ingo Molnar

Robert Richter
2010-09-03 14:05:18 +0800
de725dec9 perf, x86: Fix handle_irq return values ... Browse Code »

Now that we rely on the number of handled overflows, ensure all
handle_irq implementations actually return the right number.

Signed-off-by: Peter Zijlstra
Signed-off-by: Don Zickus
Cc: peterz@infradead.org
Cc: robert.richter@amd.com
Cc: gorcunov@gmail.com
Cc: fweisbec@gmail.com
Cc: ying.huang@intel.com
Cc: ming.m.lin@intel.com
Cc: eranian@google.com
LKML-Reference:
Signed-off-by: Ingo Molnar

Peter Zijlstra
2010-09-03 14:05:18 +0800
2e556b5b3 perf, x86: Fix accidentally ack'ing a second event on intel perf counter ... Browse Code »

During testing of a patch to stop having the perf subsytem
swallow nmis, it was uncovered that Nehalem boxes were randomly
getting unknown nmis when using the perf tool.

Moving the ack'ing of the PMI closer to when we get the status
allows the hardware to properly re-set the PMU bit signaling
another PMI was triggered during the processing of the first
PMI. This allows the new logic for dealing with the
shortcomings of multiple PMIs to handle the extra NMI by
'eat'ing it later.

Now one can wonder why are we getting a second PMI when we
disable all the PMUs in the begining of the NMI handler to
prevent such a case, for that I do not know. But I know the fix
below helps deal with this quirk.

Tested on multiple Nehalems where the problem was occuring.
With the patch, the code now loops a second time to handle the
second PMI (whereas before it was not).

Signed-off-by: Don Zickus
Cc: peterz@infradead.org
Cc: robert.richter@amd.com
Cc: gorcunov@gmail.com
Cc: fweisbec@gmail.com
Cc: ying.huang@intel.com
Cc: ming.m.lin@intel.com
Cc: eranian@google.com
LKML-Reference:
Signed-off-by: Ingo Molnar

Don Zickus
2010-09-03 14:05:17 +0800