19 Oct, 2010

2 commits

  • Commit c3f00c70 ("perf: Separate find_get_context() from event
    initialization") changed the generic perf_event code to call
    perf_event_alloc, which calls the arch-specific event_init code,
    before looking up the context for the new event. Unfortunately,
    power_pmu_event_init uses event->ctx->task to see whether the
    new event is a per-task event or a system-wide event, and thus
    crashes since event->ctx is NULL at the point where
    power_pmu_event_init gets called.

    (The reason it needs to know whether it is a per-task event is
    because there are some hardware events on Power systems which
    only count when the processor is not idle, and there are some
    fixed-function counters which count such events. For example,
    the "run cycles" event counts cycles when the processor is not
    idle. If the user asks to count cycles, we can use "run cycles"
    if this is a per-task event, since the processor is running when
    the task is running, by definition. We can't use "run cycles"
    if the user asks for "cycles" on a system-wide counter.)

    Fortunately the information we need is in the
    event->attach_state field, so we just use that instead.

    Signed-off-by: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    Reported-by: Alexey Kardashevskiy
    Signed-off-by: Ingo Molnar

    Paul Mackerras
     
  • Provide a mechanism that allows running code in IRQ context. It is
    most useful for NMI code that needs to interact with the rest of the
    system -- like wakeup a task to drain buffers.

    Perf currently has such a mechanism, so extract that and provide it as
    a generic feature, independent of perf so that others may also
    benefit.

    The IRQ context callback is generated through self-IPIs where
    possible, or on architectures like powerpc the decrementer (the
    built-in timer facility) is set to generate an interrupt immediately.

    Architectures that don't have anything like this get to do with a
    callback from the timer tick. These architectures can call
    irq_work_run() at the tail of any IRQ handlers that might enqueue such
    work (like the perf IRQ handler) to avoid undue latencies in
    processing the work.

    Signed-off-by: Peter Zijlstra
    Acked-by: Kyle McMartin
    Acked-by: Martin Schwidefsky
    [ various fixes ]
    Signed-off-by: Huang Ying
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

08 Oct, 2010

1 commit


06 Oct, 2010

2 commits

  • Since powerpc uses -Werror on arch powerpc, the build was broken like
    this:

    cc1: warnings being treated as errors
    arch/powerpc/kernel/module.c: In function 'module_finalize':
    arch/powerpc/kernel/module.c:66: error: unused variable 'err'

    Signed-off-by: Stephen Rothwell
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     
  • With all the recent module loading cleanups, we've minimized the code
    that sits under module_mutex, fixing various deadlocks and making it
    possible to do most of the module loading in parallel.

    However, that whole conversion totally missed the rather obscure code
    that adds a new module to the list for BUG() handling. That code was
    doubly obscure because (a) the code itself lives in lib/bugs.c (for
    dubious reasons) and (b) it gets called from the architecture-specific
    "module_finalize()" rather than from generic code.

    Calling it from arch-specific code makes no sense what-so-ever to begin
    with, and is now actively wrong since that code isn't protected by the
    module loading lock any more.

    So this commit moves the "module_bug_{finalize,cleanup}()" calls away
    from the arch-specific code, and into the generic code - and in the
    process protects it with the module_mutex so that the list operations
    are now safe.

    Future fixups:
    - move the module list handling code into kernel/module.c where it
    belongs.
    - get rid of 'module_bug_list' and just use the regular list of modules
    (called 'modules' - imagine that) that we already create and maintain
    for other reasons.

    Reported-and-tested-by: Thomas Gleixner
    Cc: Rusty Russell
    Cc: Adrian Bunk
    Cc: Andrew Morton
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

23 Sep, 2010

2 commits

  • Conflicts:
    arch/sparc/kernel/perf_event.c

    Merge reason: Resolve the conflict.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Make sigreturn zero regs->trap, make do_signal() do the same on all
    paths. As it is, signal interrupting e.g. read() from fd 512 (==
    ERESTARTSYS) with another signal getting unblocked when the first
    handler finishes will lead to restart one insn earlier than it ought
    to. Same for multiple signals with in-kernel handlers interrupting
    that sucker at the same time. Same for multiple signals of any kind
    interrupting that sucker on 64bit...

    Signed-off-by: Al Viro
    Acked-by: Paul Mackerras
    Signed-off-by: Linus Torvalds

    Al Viro
     

15 Sep, 2010

1 commit


10 Sep, 2010

5 commits

  • Replace pmu::{enable,disable,start,stop,unthrottle} with
    pmu::{add,del,start,stop}, all of which take a flags argument.

    The new interface extends the capability to stop a counter while
    keeping it scheduled on the PMU. We replace the throttled state with
    the generic stopped state.

    This also allows us to efficiently stop/start counters over certain
    code paths (like IRQ handlers).

    It also allows scheduling a counter without it starting, allowing for
    a generic frozen state (useful for rotating stopped counters).

    The stopped state is implemented in two different ways, depending on
    how the architecture implemented the throttled state:

    1) We disable the counter:
    a) the pmu has per-counter enable bits, we flip that
    b) we program a NOP event, preserving the counter state

    2) We store the counter state and ignore all read/overflow events

    Signed-off-by: Peter Zijlstra
    Cc: paulus
    Cc: stephane eranian
    Cc: Robert Richter
    Cc: Will Deacon
    Cc: Paul Mundt
    Cc: Frederic Weisbecker
    Cc: Cyrill Gorcunov
    Cc: Lin Ming
    Cc: Yanmin
    Cc: Deng-Cheng Zhu
    Cc: David Miller
    Cc: Michael Cree
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Changes perf_disable() into perf_pmu_disable().

    Signed-off-by: Peter Zijlstra
    Cc: paulus
    Cc: stephane eranian
    Cc: Robert Richter
    Cc: Will Deacon
    Cc: Paul Mundt
    Cc: Frederic Weisbecker
    Cc: Cyrill Gorcunov
    Cc: Lin Ming
    Cc: Yanmin
    Cc: Deng-Cheng Zhu
    Cc: David Miller
    Cc: Michael Cree
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Since the current perf_disable() usage is only an optimization,
    remove it for now. This eases the removal of the __weak
    hw_perf_enable() interface.

    Signed-off-by: Peter Zijlstra
    Cc: paulus
    Cc: stephane eranian
    Cc: Robert Richter
    Cc: Will Deacon
    Cc: Paul Mundt
    Cc: Frederic Weisbecker
    Cc: Cyrill Gorcunov
    Cc: Lin Ming
    Cc: Yanmin
    Cc: Deng-Cheng Zhu
    Cc: David Miller
    Cc: Michael Cree
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Simple registration interface for struct pmu, this provides the
    infrastructure for removing all the weak functions.

    Signed-off-by: Peter Zijlstra
    Cc: paulus
    Cc: stephane eranian
    Cc: Robert Richter
    Cc: Will Deacon
    Cc: Paul Mundt
    Cc: Frederic Weisbecker
    Cc: Cyrill Gorcunov
    Cc: Lin Ming
    Cc: Yanmin
    Cc: Deng-Cheng Zhu
    Cc: David Miller
    Cc: Michael Cree
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • sed -ie 's/const struct pmu\>/struct pmu/g' `git grep -l "const struct pmu\>"`

    Signed-off-by: Peter Zijlstra
    Cc: paulus
    Cc: stephane eranian
    Cc: Robert Richter
    Cc: Will Deacon
    Cc: Paul Mundt
    Cc: Frederic Weisbecker
    Cc: Cyrill Gorcunov
    Cc: Lin Ming
    Cc: Yanmin
    Cc: Deng-Cheng Zhu
    Cc: David Miller
    Cc: Michael Cree
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

31 Aug, 2010

3 commits

  • In f761622e59433130bc33ad086ce219feee9eb961 we changed
    early_setup_secondary so it's called using the proper kernel stack
    rather than the emergency one.

    Unfortunately, this stack pointer can't be used when translation is off
    on PHYP as this stack pointer might be outside the RMO. This results in
    the following on all non zero cpus:
    cpu 0x1: Vector: 300 (Data Access) at [c00000001639fd10]
    pc: 000000000001c50c
    lr: 000000000000821c
    sp: c00000001639ff90
    msr: 8000000000001000
    dar: c00000001639ffa0
    dsisr: 42000000
    current = 0xc000000016393540
    paca = 0xc000000006e00200
    pid = 0, comm = swapper

    The original patch was only tested on bare metal system, so it never
    caught this problem.

    This changes __secondary_start so that we calculate the new stack
    pointer but only start using it after we've called early_setup_secondary.

    With this patch, the above problem goes away.

    Signed-off-by: Michael Neuling
    Signed-off-by: Benjamin Herrenschmidt

    Michael Neuling
     
  • Commit 0fe1ac48 ("powerpc/perf_event: Fix oops due to
    perf_event_do_pending call") moved the call to perf_event_do_pending
    in timer_interrupt() down so that it was after the irq_enter() call.
    Unfortunately this moved it after the code that checks whether it
    is time for the next decrementer clock event. The result is that
    the call to perf_event_do_pending() won't happen until the next
    decrementer clock event is due. This was pointed out by Milton
    Miller.

    This fixes it by moving the check for whether it's time for the
    next decrementer clock event down to the point where we're about
    to call the event handler, after we've called perf_event_do_pending.

    This has the side effect that on old pre-Core99 Powermacs where we
    use the ppc_n_lost_interrupts mechanism to replay interrupts, a
    replayed interrupt will incur a little more latency since it will
    now do the code from the irq_enter down to the irq_exit, that it
    used to skip. However, these machines are now old and rare enough
    that this doesn't matter. To make it clear that ppc_n_lost_interrupts
    is only used on Powermacs, and to speed up the code slightly on
    non-Powermac ppc32 machines, the code that tests ppc_n_lost_interrupts
    is now conditional on CONFIG_PMAC as well as CONFIG_PPC32.

    Signed-off-by: Paul Mackerras
    Cc: stable@kernel.org
    Signed-off-by: Benjamin Herrenschmidt

    Paul Mackerras
     
  • Call kexec purgatory code correctly. We were getting lucky before.
    If you examine the powerpc 32bit kexec "purgatory" code you will
    see it expects the following:

    >From kexec-tools: purgatory/arch/ppc/v2wrap_32.S
    -> calling convention:
    -> r3 = physical number of this cpu (all cpus)
    -> r4 = address of this chunk (master only)

    As such, we need to set r3 to the current core, r4 happens to be
    unused by purgatory at the moment but we go ahead and set it
    here as well

    Signed-off-by: Matthew McClintock
    Signed-off-by: Benjamin Herrenschmidt

    Matthew McClintock
     

25 Aug, 2010

1 commit


24 Aug, 2010

12 commits

  • Signed-off-by: Andreas Schwab
    Signed-off-by: Benjamin Herrenschmidt

    Andreas Schwab
     
  • pci_device_to_OF_node() can return null, and list_for_each_entry will
    never enter the loop when dev is NULL, so it looks like this test is
    a typo.

    Reported-by: Julia Lawall
    Signed-off-by: Grant Likely
    Signed-off-by: Benjamin Herrenschmidt

    Grant Likely
     
  • As early setup calls down to slb_initialize(), we must have kstack
    initialised before checking "should we add a bolted SLB entry for our kstack?"

    Failing to do so means stack access requires an SLB miss exception to refill
    an entry dynamically, if the stack isn't accessible via SLB(0) (kernel text
    & static data). It's not always allowable to take such a miss, and
    intermittent crashes will result.

    Primary CPUs don't have this issue; an SLB entry is not bolted for their
    stack anyway (as that lives within SLB(0)). This patch therefore only
    affects the init of secondaries.

    Signed-off-by: Matt Evans
    Cc: stable
    Signed-off-by: Benjamin Herrenschmidt

    Matt Evans
     
  • When looking at some issues with the virtual ethernet driver I noticed
    that TCE allocation was following a very strange pattern:

    address 00e9000 length 2048
    address 0409000 length 2048
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • I'm sick of seeing ppc64_runlatch_off in our profiles, so inline it
    into the callers. To avoid a mess of circular includes I didn't add
    it as an inline function.

    Signed-off-by: Anton Blanchard
    Acked-by: Olof Johansson
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • The 'smt_enabled=X' boot option does not handle values of X > 2.
    For Power 7 processors with smt modes of 0,1,2,3, and 4 this does
    not work. This patch allows the smt_enabled option to be set to
    any value limited to a max equal to the number of threads per
    core.

    Signed-off-by: Nathan Fontenot
    Signed-off-by: Benjamin Herrenschmidt

    Nathan Fontenot
     
  • During CPU offline/online tests __cpu_up would flood the logs with
    the following message:

    Processor 0 found.

    This provides no useful information to the user as there is no context
    provided, and since the operation was a success (to this point) it is expected
    that the CPU will come back online, providing all the feedback necessary.

    Change the "Processor found" message to DBG() similar to other such messages in
    the same function. Also, add an appropriate log level for the "Processor is
    stuck" message.

    Signed-off-by: Darren Hart
    Acked-by: Will Schmidt
    Cc: Thomas Gleixner
    Cc: Nathan Fontenot
    Cc: Robert Jennings
    Cc: Brian King
    Signed-off-by: Benjamin Herrenschmidt

    Signed-off-by: Darren Hart
     
  • start_secondary() is called shortly after _start and also via

    cpu_idle()->cpu_die()->pseries_mach_cpu_die()

    start_secondary() expects a preempt_count() of 0. pseries_mach_cpu_die() is
    called via the cpu_idle() routine with preemption disabled, resulting in the
    following repeating message during rapid cpu offline/online tests
    with CONFIG_PREEMPT=y:

    BUG: scheduling while atomic: swapper/0/0x00000002
    Modules linked in: autofs4 binfmt_misc dm_mirror dm_region_hash dm_log [last unloaded: scsi_wait_scan]
    Call Trace:
    [c00000010e7079c0] [c0000000000133ec] .show_stack+0xd8/0x218 (unreliable)
    [c00000010e707aa0] [c0000000006a47f0] .dump_stack+0x28/0x3c
    [c00000010e707b20] [c00000000006e7a4] .__schedule_bug+0x7c/0x9c
    [c00000010e707bb0] [c000000000699d9c] .schedule+0x104/0x800
    [c00000010e707cd0] [c000000000015b24] .cpu_idle+0x1c4/0x1d8
    [c00000010e707d70] [c0000000006aa1b4] .start_secondary+0x398/0x3d4
    [c00000010e707e30] [c000000000008278] .start_secondary_resume+0x10/0x14

    Move the cpu_die() call inside the existing preemption enabled block of
    cpu_idle(). This is safe as the idle task is affined to a single CPU so the
    debug_smp_processor_id() tests (from cpu_should_die()) won't trigger as we are
    in a "migration disabled" region.

    Signed-off-by: Darren Hart
    Acked-by: Will Schmidt
    Cc: Thomas Gleixner
    Cc: Nathan Fontenot
    Cc: Robert Jennings
    Cc: Brian King
    Signed-off-by: Benjamin Herrenschmidt

    Signed-off-by: Darren Hart
     
  • list_for_each_entry binds its first argument to a non-null value, and thus
    any null test on the value of that argument is superfluous.

    The semantic patch that makes this change is as follows:
    (http://coccinelle.lip6.fr/)

    //
    @@
    iterator I;
    expression x,E,E1,E2;
    statement S,S1,S2;
    @@

    I(x,...) { }
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Benjamin Herrenschmidt

    Julia Lawall
     
  • During kdump we run the crash handlers first then stop all other CPUs.
    We really want to stop all CPUs as close to the fail as possible and also
    have a very controlled environment for running the crash handlers, so it
    makes sense to reverse the order.

    Signed-off-by: Anton Blanchard
    Acked-by: Matt Evans
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • Use is_32bit_task() helper to test 32 bit binary.

    Signed-off-by: Denis Kirjanov
    Signed-off-by: Benjamin Herrenschmidt

    Denis Kirjanov
     
  • Benjamin Herrenschmidt
     

23 Aug, 2010

3 commits


19 Aug, 2010

4 commits

  • …rostedt/linux-2.6-trace into perf/core

    Ingo Molnar
     
  • Store the kernel and user contexts from the generic layer instead
    of archs, this gathers some repetitive code.

    Signed-off-by: Frederic Weisbecker
    Acked-by: Paul Mackerras
    Tested-by: Will Deacon
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Stephane Eranian
    Cc: David Miller
    Cc: Paul Mundt
    Cc: Borislav Petkov

    Frederic Weisbecker
     
  • - Most archs use one callchain buffer per cpu, except x86 that needs
    to deal with NMIs. Provide a default perf_callchain_buffer()
    implementation that x86 overrides.

    - Centralize all the kernel/user regs handling and invoke new arch
    handlers from there: perf_callchain_user() / perf_callchain_kernel()
    That avoid all the user_mode(), current->mm checks and so...

    - Invert some parameters in perf_callchain_*() helpers: entry to the
    left, regs to the right, following the traditional (dst, src).

    Signed-off-by: Frederic Weisbecker
    Acked-by: Paul Mackerras
    Tested-by: Will Deacon
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Stephane Eranian
    Cc: David Miller
    Cc: Paul Mundt
    Cc: Borislav Petkov

    Frederic Weisbecker
     
  • callchain_store() is the same on every archs, inline it in
    perf_event.h and rename it to perf_callchain_store() to avoid
    any collision.

    This removes repetitive code.

    Signed-off-by: Frederic Weisbecker
    Acked-by: Paul Mackerras
    Tested-by: Will Deacon
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Stephane Eranian
    Cc: David Miller
    Cc: Paul Mundt
    Cc: Borislav Petkov

    Frederic Weisbecker
     

18 Aug, 2010

1 commit

  • Make do_execve() take a const filename pointer so that kernel_execve() compiles
    correctly on ARM:

    arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type

    This also requires the argv and envp arguments to be consted twice, once for
    the pointer array and once for the strings the array points to. This is
    because do_execve() passes a pointer to the filename (now const) to
    copy_strings_kernel(). A simpler alternative would be to cast the filename
    pointer in do_execve() when it's passed to copy_strings_kernel().

    do_execve() may not change any of the strings it is passed as part of the argv
    or envp lists as they are some of them in .rodata, so marking these strings as
    const should be fine.

    Further kernel_execve() and sys_execve() need to be changed to match.

    This has been test built on x86_64, frv, arm and mips.

    Signed-off-by: David Howells
    Tested-by: Ralf Baechle
    Acked-by: Russell King
    Signed-off-by: Linus Torvalds

    David Howells
     

14 Aug, 2010

1 commit

  • Mark arguments to certain system calls as being const where they should be but
    aren't. The list includes:

    (*) The filename arguments of various stat syscalls, execve(), various utimes
    syscalls and some mount syscalls.

    (*) The filename arguments of some syscall helpers relating to the above.

    (*) The buffer argument of various write syscalls.

    Signed-off-by: David Howells
    Acked-by: David S. Miller
    Signed-off-by: Linus Torvalds

    David Howells
     

07 Aug, 2010

2 commits

  • of_i8042_{kbd,aux}_irq needs to be exported

    Signed-off-by: Grant Likely

    Grant Likely
     
  • …x/kernel/git/tip/linux-2.6-tip

    * 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    um: Fix read_persistent_clock fallout
    kgdb: Do not access xtime directly
    powerpc: Clean up obsolete code relating to decrementer and timebase
    powerpc: Rework VDSO gettimeofday to prevent time going backwards
    clocksource: Add __clocksource_updatefreq_hz/khz methods
    x86: Convert common clocksources to use clocksource_register_hz/khz
    timekeeping: Make xtime and wall_to_monotonic static
    hrtimer: Cleanup direct access to wall_to_monotonic
    um: Convert to use read_persistent_clock
    timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset
    powerpc: Cleanup xtime usage
    powerpc: Simplify update_vsyscall
    time: Kill off CONFIG_GENERIC_TIME
    time: Implement timespec_add
    x86: Fix vtime/file timestamp inconsistencies

    Trivial conflicts in Documentation/feature-removal-schedule.txt

    Much less trivial conflicts in arch/powerpc/kernel/time.c resolved as
    per Thomas' earlier merge commit 47916be4e28c ("Merge branch
    'powerpc.cherry-picks' into timers/clocksource")

    Linus Torvalds