19 Oct, 2010
2 commits
-
Commit c3f00c70 ("perf: Separate find_get_context() from event
initialization") changed the generic perf_event code to call
perf_event_alloc, which calls the arch-specific event_init code,
before looking up the context for the new event. Unfortunately,
power_pmu_event_init uses event->ctx->task to see whether the
new event is a per-task event or a system-wide event, and thus
crashes since event->ctx is NULL at the point where
power_pmu_event_init gets called.(The reason it needs to know whether it is a per-task event is
because there are some hardware events on Power systems which
only count when the processor is not idle, and there are some
fixed-function counters which count such events. For example,
the "run cycles" event counts cycles when the processor is not
idle. If the user asks to count cycles, we can use "run cycles"
if this is a per-task event, since the processor is running when
the task is running, by definition. We can't use "run cycles"
if the user asks for "cycles" on a system-wide counter.)Fortunately the information we need is in the
event->attach_state field, so we just use that instead.Signed-off-by: Paul Mackerras
Cc: Benjamin Herrenschmidt
Cc: Peter Zijlstra
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar
Reported-by: Alexey Kardashevskiy
Signed-off-by: Ingo Molnar -
Provide a mechanism that allows running code in IRQ context. It is
most useful for NMI code that needs to interact with the rest of the
system -- like wakeup a task to drain buffers.Perf currently has such a mechanism, so extract that and provide it as
a generic feature, independent of perf so that others may also
benefit.The IRQ context callback is generated through self-IPIs where
possible, or on architectures like powerpc the decrementer (the
built-in timer facility) is set to generate an interrupt immediately.Architectures that don't have anything like this get to do with a
callback from the timer tick. These architectures can call
irq_work_run() at the tail of any IRQ handlers that might enqueue such
work (like the perf IRQ handler) to avoid undue latencies in
processing the work.Signed-off-by: Peter Zijlstra
Acked-by: Kyle McMartin
Acked-by: Martin Schwidefsky
[ various fixes ]
Signed-off-by: Huang Ying
LKML-Reference:
Signed-off-by: Ingo Molnar
08 Oct, 2010
1 commit
-
Conflicts:
arch/x86/kernel/module.cMerge reason: Resolve the conflict, pick up fixes.
Signed-off-by: Ingo Molnar
06 Oct, 2010
2 commits
-
Since powerpc uses -Werror on arch powerpc, the build was broken like
this:cc1: warnings being treated as errors
arch/powerpc/kernel/module.c: In function 'module_finalize':
arch/powerpc/kernel/module.c:66: error: unused variable 'err'Signed-off-by: Stephen Rothwell
Signed-off-by: Linus Torvalds -
With all the recent module loading cleanups, we've minimized the code
that sits under module_mutex, fixing various deadlocks and making it
possible to do most of the module loading in parallel.However, that whole conversion totally missed the rather obscure code
that adds a new module to the list for BUG() handling. That code was
doubly obscure because (a) the code itself lives in lib/bugs.c (for
dubious reasons) and (b) it gets called from the architecture-specific
"module_finalize()" rather than from generic code.Calling it from arch-specific code makes no sense what-so-ever to begin
with, and is now actively wrong since that code isn't protected by the
module loading lock any more.So this commit moves the "module_bug_{finalize,cleanup}()" calls away
from the arch-specific code, and into the generic code - and in the
process protects it with the module_mutex so that the list operations
are now safe.Future fixups:
- move the module list handling code into kernel/module.c where it
belongs.
- get rid of 'module_bug_list' and just use the regular list of modules
(called 'modules' - imagine that) that we already create and maintain
for other reasons.Reported-and-tested-by: Thomas Gleixner
Cc: Rusty Russell
Cc: Adrian Bunk
Cc: Andrew Morton
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds
23 Sep, 2010
2 commits
-
Conflicts:
arch/sparc/kernel/perf_event.cMerge reason: Resolve the conflict.
Signed-off-by: Ingo Molnar
-
Make sigreturn zero regs->trap, make do_signal() do the same on all
paths. As it is, signal interrupting e.g. read() from fd 512 (==
ERESTARTSYS) with another signal getting unblocked when the first
handler finishes will lead to restart one insn earlier than it ought
to. Same for multiple signals with in-kernel handlers interrupting
that sucker at the same time. Same for multiple signals of any kind
interrupting that sucker on 64bit...Signed-off-by: Al Viro
Acked-by: Paul Mackerras
Signed-off-by: Linus Torvalds
15 Sep, 2010
1 commit
-
…stedt/linux-2.6-trace into perf/core
10 Sep, 2010
5 commits
-
Replace pmu::{enable,disable,start,stop,unthrottle} with
pmu::{add,del,start,stop}, all of which take a flags argument.The new interface extends the capability to stop a counter while
keeping it scheduled on the PMU. We replace the throttled state with
the generic stopped state.This also allows us to efficiently stop/start counters over certain
code paths (like IRQ handlers).It also allows scheduling a counter without it starting, allowing for
a generic frozen state (useful for rotating stopped counters).The stopped state is implemented in two different ways, depending on
how the architecture implemented the throttled state:1) We disable the counter:
a) the pmu has per-counter enable bits, we flip that
b) we program a NOP event, preserving the counter state2) We store the counter state and ignore all read/overflow events
Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Will Deacon
Cc: Paul Mundt
Cc: Frederic Weisbecker
Cc: Cyrill Gorcunov
Cc: Lin Ming
Cc: Yanmin
Cc: Deng-Cheng Zhu
Cc: David Miller
Cc: Michael Cree
LKML-Reference:
Signed-off-by: Ingo Molnar -
Changes perf_disable() into perf_pmu_disable().
Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Will Deacon
Cc: Paul Mundt
Cc: Frederic Weisbecker
Cc: Cyrill Gorcunov
Cc: Lin Ming
Cc: Yanmin
Cc: Deng-Cheng Zhu
Cc: David Miller
Cc: Michael Cree
LKML-Reference:
Signed-off-by: Ingo Molnar -
Since the current perf_disable() usage is only an optimization,
remove it for now. This eases the removal of the __weak
hw_perf_enable() interface.Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Will Deacon
Cc: Paul Mundt
Cc: Frederic Weisbecker
Cc: Cyrill Gorcunov
Cc: Lin Ming
Cc: Yanmin
Cc: Deng-Cheng Zhu
Cc: David Miller
Cc: Michael Cree
LKML-Reference:
Signed-off-by: Ingo Molnar -
Simple registration interface for struct pmu, this provides the
infrastructure for removing all the weak functions.Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Will Deacon
Cc: Paul Mundt
Cc: Frederic Weisbecker
Cc: Cyrill Gorcunov
Cc: Lin Ming
Cc: Yanmin
Cc: Deng-Cheng Zhu
Cc: David Miller
Cc: Michael Cree
LKML-Reference:
Signed-off-by: Ingo Molnar -
sed -ie 's/const struct pmu\>/struct pmu/g' `git grep -l "const struct pmu\>"`
Signed-off-by: Peter Zijlstra
Cc: paulus
Cc: stephane eranian
Cc: Robert Richter
Cc: Will Deacon
Cc: Paul Mundt
Cc: Frederic Weisbecker
Cc: Cyrill Gorcunov
Cc: Lin Ming
Cc: Yanmin
Cc: Deng-Cheng Zhu
Cc: David Miller
Cc: Michael Cree
LKML-Reference:
Signed-off-by: Ingo Molnar
31 Aug, 2010
3 commits
-
In f761622e59433130bc33ad086ce219feee9eb961 we changed
early_setup_secondary so it's called using the proper kernel stack
rather than the emergency one.Unfortunately, this stack pointer can't be used when translation is off
on PHYP as this stack pointer might be outside the RMO. This results in
the following on all non zero cpus:
cpu 0x1: Vector: 300 (Data Access) at [c00000001639fd10]
pc: 000000000001c50c
lr: 000000000000821c
sp: c00000001639ff90
msr: 8000000000001000
dar: c00000001639ffa0
dsisr: 42000000
current = 0xc000000016393540
paca = 0xc000000006e00200
pid = 0, comm = swapperThe original patch was only tested on bare metal system, so it never
caught this problem.This changes __secondary_start so that we calculate the new stack
pointer but only start using it after we've called early_setup_secondary.With this patch, the above problem goes away.
Signed-off-by: Michael Neuling
Signed-off-by: Benjamin Herrenschmidt -
Commit 0fe1ac48 ("powerpc/perf_event: Fix oops due to
perf_event_do_pending call") moved the call to perf_event_do_pending
in timer_interrupt() down so that it was after the irq_enter() call.
Unfortunately this moved it after the code that checks whether it
is time for the next decrementer clock event. The result is that
the call to perf_event_do_pending() won't happen until the next
decrementer clock event is due. This was pointed out by Milton
Miller.This fixes it by moving the check for whether it's time for the
next decrementer clock event down to the point where we're about
to call the event handler, after we've called perf_event_do_pending.This has the side effect that on old pre-Core99 Powermacs where we
use the ppc_n_lost_interrupts mechanism to replay interrupts, a
replayed interrupt will incur a little more latency since it will
now do the code from the irq_enter down to the irq_exit, that it
used to skip. However, these machines are now old and rare enough
that this doesn't matter. To make it clear that ppc_n_lost_interrupts
is only used on Powermacs, and to speed up the code slightly on
non-Powermac ppc32 machines, the code that tests ppc_n_lost_interrupts
is now conditional on CONFIG_PMAC as well as CONFIG_PPC32.Signed-off-by: Paul Mackerras
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt -
Call kexec purgatory code correctly. We were getting lucky before.
If you examine the powerpc 32bit kexec "purgatory" code you will
see it expects the following:>From kexec-tools: purgatory/arch/ppc/v2wrap_32.S
-> calling convention:
-> r3 = physical number of this cpu (all cpus)
-> r4 = address of this chunk (master only)As such, we need to set r3 to the current core, r4 happens to be
unused by purgatory at the moment but we go ahead and set it
here as wellSigned-off-by: Matthew McClintock
Signed-off-by: Benjamin Herrenschmidt
25 Aug, 2010
1 commit
-
Merge reason: pick up perf fixes
Signed-off-by: Ingo Molnar
24 Aug, 2010
12 commits
-
Signed-off-by: Andreas Schwab
Signed-off-by: Benjamin Herrenschmidt -
pci_device_to_OF_node() can return null, and list_for_each_entry will
never enter the loop when dev is NULL, so it looks like this test is
a typo.Reported-by: Julia Lawall
Signed-off-by: Grant Likely
Signed-off-by: Benjamin Herrenschmidt -
As early setup calls down to slb_initialize(), we must have kstack
initialised before checking "should we add a bolted SLB entry for our kstack?"Failing to do so means stack access requires an SLB miss exception to refill
an entry dynamically, if the stack isn't accessible via SLB(0) (kernel text
& static data). It's not always allowable to take such a miss, and
intermittent crashes will result.Primary CPUs don't have this issue; an SLB entry is not bolted for their
stack anyway (as that lives within SLB(0)). This patch therefore only
affects the init of secondaries.Signed-off-by: Matt Evans
Cc: stable
Signed-off-by: Benjamin Herrenschmidt -
When looking at some issues with the virtual ethernet driver I noticed
that TCE allocation was following a very strange pattern:address 00e9000 length 2048
address 0409000 length 2048
Signed-off-by: Benjamin Herrenschmidt -
I'm sick of seeing ppc64_runlatch_off in our profiles, so inline it
into the callers. To avoid a mess of circular includes I didn't add
it as an inline function.Signed-off-by: Anton Blanchard
Acked-by: Olof Johansson
Signed-off-by: Benjamin Herrenschmidt -
The 'smt_enabled=X' boot option does not handle values of X > 2.
For Power 7 processors with smt modes of 0,1,2,3, and 4 this does
not work. This patch allows the smt_enabled option to be set to
any value limited to a max equal to the number of threads per
core.Signed-off-by: Nathan Fontenot
Signed-off-by: Benjamin Herrenschmidt -
During CPU offline/online tests __cpu_up would flood the logs with
the following message:Processor 0 found.
This provides no useful information to the user as there is no context
provided, and since the operation was a success (to this point) it is expected
that the CPU will come back online, providing all the feedback necessary.Change the "Processor found" message to DBG() similar to other such messages in
the same function. Also, add an appropriate log level for the "Processor is
stuck" message.Signed-off-by: Darren Hart
Acked-by: Will Schmidt
Cc: Thomas Gleixner
Cc: Nathan Fontenot
Cc: Robert Jennings
Cc: Brian King
Signed-off-by: Benjamin Herrenschmidt -
start_secondary() is called shortly after _start and also via
cpu_idle()->cpu_die()->pseries_mach_cpu_die()
start_secondary() expects a preempt_count() of 0. pseries_mach_cpu_die() is
called via the cpu_idle() routine with preemption disabled, resulting in the
following repeating message during rapid cpu offline/online tests
with CONFIG_PREEMPT=y:BUG: scheduling while atomic: swapper/0/0x00000002
Modules linked in: autofs4 binfmt_misc dm_mirror dm_region_hash dm_log [last unloaded: scsi_wait_scan]
Call Trace:
[c00000010e7079c0] [c0000000000133ec] .show_stack+0xd8/0x218 (unreliable)
[c00000010e707aa0] [c0000000006a47f0] .dump_stack+0x28/0x3c
[c00000010e707b20] [c00000000006e7a4] .__schedule_bug+0x7c/0x9c
[c00000010e707bb0] [c000000000699d9c] .schedule+0x104/0x800
[c00000010e707cd0] [c000000000015b24] .cpu_idle+0x1c4/0x1d8
[c00000010e707d70] [c0000000006aa1b4] .start_secondary+0x398/0x3d4
[c00000010e707e30] [c000000000008278] .start_secondary_resume+0x10/0x14Move the cpu_die() call inside the existing preemption enabled block of
cpu_idle(). This is safe as the idle task is affined to a single CPU so the
debug_smp_processor_id() tests (from cpu_should_die()) won't trigger as we are
in a "migration disabled" region.Signed-off-by: Darren Hart
Acked-by: Will Schmidt
Cc: Thomas Gleixner
Cc: Nathan Fontenot
Cc: Robert Jennings
Cc: Brian King
Signed-off-by: Benjamin Herrenschmidt -
list_for_each_entry binds its first argument to a non-null value, and thus
any null test on the value of that argument is superfluous.The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)//
@@
iterator I;
expression x,E,E1,E2;
statement S,S1,S2;
@@I(x,...) { }
//Signed-off-by: Julia Lawall
Signed-off-by: Benjamin Herrenschmidt -
During kdump we run the crash handlers first then stop all other CPUs.
We really want to stop all CPUs as close to the fail as possible and also
have a very controlled environment for running the crash handlers, so it
makes sense to reverse the order.Signed-off-by: Anton Blanchard
Acked-by: Matt Evans
Signed-off-by: Benjamin Herrenschmidt -
Use is_32bit_task() helper to test 32 bit binary.
Signed-off-by: Denis Kirjanov
Signed-off-by: Benjamin Herrenschmidt
23 Aug, 2010
3 commits
-
The interrupt stacks need to be indexed by the physical cpu since the
critical, debug and machine check handlers use the contents of SPRN_PIR to
index the critirq_ctx, dbgirq_ctx, and mcheckirq_ctx arrays.Signed-off-by: Dave Kleikamp
Signed-off-by: Josh Boyer -
There are two entries for .cpu_user_features in
arch/powerpc/kernel/cputable.c. Remove the one that doesn't belongSigned-off-by: Dave Kleikamp
Signed-off-by: Josh Boyer -
Clear the machine check syndrom register before enabling machine check
interrupts. The initial state of the tlb can lead to parity errors being
flagged early after a cold boot.Signed-off-by: Dave Kleikamp
Signed-off-by: Josh Boyer
19 Aug, 2010
4 commits
-
…rostedt/linux-2.6-trace into perf/core
-
Store the kernel and user contexts from the generic layer instead
of archs, this gathers some repetitive code.Signed-off-by: Frederic Weisbecker
Acked-by: Paul Mackerras
Tested-by: Will Deacon
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Stephane Eranian
Cc: David Miller
Cc: Paul Mundt
Cc: Borislav Petkov -
- Most archs use one callchain buffer per cpu, except x86 that needs
to deal with NMIs. Provide a default perf_callchain_buffer()
implementation that x86 overrides.- Centralize all the kernel/user regs handling and invoke new arch
handlers from there: perf_callchain_user() / perf_callchain_kernel()
That avoid all the user_mode(), current->mm checks and so...- Invert some parameters in perf_callchain_*() helpers: entry to the
left, regs to the right, following the traditional (dst, src).Signed-off-by: Frederic Weisbecker
Acked-by: Paul Mackerras
Tested-by: Will Deacon
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Stephane Eranian
Cc: David Miller
Cc: Paul Mundt
Cc: Borislav Petkov -
callchain_store() is the same on every archs, inline it in
perf_event.h and rename it to perf_callchain_store() to avoid
any collision.This removes repetitive code.
Signed-off-by: Frederic Weisbecker
Acked-by: Paul Mackerras
Tested-by: Will Deacon
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Stephane Eranian
Cc: David Miller
Cc: Paul Mundt
Cc: Borislav Petkov
18 Aug, 2010
1 commit
-
Make do_execve() take a const filename pointer so that kernel_execve() compiles
correctly on ARM:arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type
This also requires the argv and envp arguments to be consted twice, once for
the pointer array and once for the strings the array points to. This is
because do_execve() passes a pointer to the filename (now const) to
copy_strings_kernel(). A simpler alternative would be to cast the filename
pointer in do_execve() when it's passed to copy_strings_kernel().do_execve() may not change any of the strings it is passed as part of the argv
or envp lists as they are some of them in .rodata, so marking these strings as
const should be fine.Further kernel_execve() and sys_execve() need to be changed to match.
This has been test built on x86_64, frv, arm and mips.
Signed-off-by: David Howells
Tested-by: Ralf Baechle
Acked-by: Russell King
Signed-off-by: Linus Torvalds
14 Aug, 2010
1 commit
-
Mark arguments to certain system calls as being const where they should be but
aren't. The list includes:(*) The filename arguments of various stat syscalls, execve(), various utimes
syscalls and some mount syscalls.(*) The filename arguments of some syscall helpers relating to the above.
(*) The buffer argument of various write syscalls.
Signed-off-by: David Howells
Acked-by: David S. Miller
Signed-off-by: Linus Torvalds
07 Aug, 2010
2 commits
-
of_i8042_{kbd,aux}_irq needs to be exported
Signed-off-by: Grant Likely
-
…x/kernel/git/tip/linux-2.6-tip
* 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
um: Fix read_persistent_clock fallout
kgdb: Do not access xtime directly
powerpc: Clean up obsolete code relating to decrementer and timebase
powerpc: Rework VDSO gettimeofday to prevent time going backwards
clocksource: Add __clocksource_updatefreq_hz/khz methods
x86: Convert common clocksources to use clocksource_register_hz/khz
timekeeping: Make xtime and wall_to_monotonic static
hrtimer: Cleanup direct access to wall_to_monotonic
um: Convert to use read_persistent_clock
timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset
powerpc: Cleanup xtime usage
powerpc: Simplify update_vsyscall
time: Kill off CONFIG_GENERIC_TIME
time: Implement timespec_add
x86: Fix vtime/file timestamp inconsistenciesTrivial conflicts in Documentation/feature-removal-schedule.txt
Much less trivial conflicts in arch/powerpc/kernel/time.c resolved as
per Thomas' earlier merge commit 47916be4e28c ("Merge branch
'powerpc.cherry-picks' into timers/clocksource")