19 Oct, 2010

15 commits

  • …nel/git/rostedt/linux-2.6-trace into perf/core

    Ingo Molnar
     
  • When DYNAMIC_FTRACE is enabled and we use the C version of recordmcount,
    all objects are run through the recordmcount program to create a
    separate section that stores all the callers of mcount.

    The build process has a special file: scripts/mod/empty.o. This is
    built from empty.c which is literally an empty file (except for a
    single comment). This file is used to find information about the target
    elf format, like endianness and word size.

    The problem comes up when we need to build recordmcount. The
    build process requires that empty.o is built first. The build rules
    for empty.o will try to execute recordmcount on the empty.o file.
    We get an error that recordmcount does not exist.

    To avoid this recursion, the build file will skip running recordmcount
    if the file that it is building is script/mod/empty.o.

    [ extra comment Suggested-by: Sam Ravnborg ]

    Reported-by: Ingo Molnar
    Tested-by: Ingo Molnar
    Cc: Michal Marek
    Cc: linux-kbuild@vger.kernel.org
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The use of the JUMP_LABEL() construct ends up creating endless silly
    wrappers, create a higher level construct to reduce this clutter.

    Signed-off-by: Peter Zijlstra
    Cc: Jason Baron
    Cc: Steven Rostedt
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Acked-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Trades a call + conditional + ret for an unconditional jmp.

    Acked-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Add an interface to allow usage of jump_labels with atomic counters.

    Signed-off-by: Peter Zijlstra
    Acked-by: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Now that there's still only a few users around, rename things to make
    them more consistent.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • hw_breakpoint creation needs to account stuff per-task to ensure there
    is always sufficient hardware resources to back these things due to
    ptrace.

    With the perf per pmu context changes the event initialization no
    longer has access to the event context, for the simple reason that we
    need to first find the pmu (result of initialization) before we can
    find the context.

    This makes hw_breakpoints unhappy, because it can no longer do per
    task accounting, cure this by frobbing a task pointer in the event::hw
    bits for now...

    Signed-off-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • So that we can pass the task pointer to the event allocation, so that
    we can use task associated data during event initialization.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Currently it looks like find_lively_task_by_vpid() takes a task ref
    and relies on find_get_context() to drop it.

    The problem is that perf_event_create_kernel_counter() shouldn't be
    dropping task refs.

    Signed-off-by: Peter Zijlstra
    Acked-by: Frederic Weisbecker
    Acked-by: Matt Helsley
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Matt found we trigger the WARN_ON_ONCE() in perf_group_attach() when we take
    the move_group path in perf_event_open().

    Since we cannot de-construct the group (we rely on it to move the events), we
    have to simply ignore the double attach. The group state is context invariant
    and doesn't need changing.

    Reported-by: Matt Fleming
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Provide a mechanism that allows running code in IRQ context. It is
    most useful for NMI code that needs to interact with the rest of the
    system -- like wakeup a task to drain buffers.

    Perf currently has such a mechanism, so extract that and provide it as
    a generic feature, independent of perf so that others may also
    benefit.

    The IRQ context callback is generated through self-IPIs where
    possible, or on architectures like powerpc the decrementer (the
    built-in timer facility) is set to generate an interrupt immediately.

    Architectures that don't have anything like this get to do with a
    callback from the timer tick. These architectures can call
    irq_work_run() at the tail of any IRQ handlers that might enqueue such
    work (like the perf IRQ handler) to avoid undue latencies in
    processing the work.

    Signed-off-by: Peter Zijlstra
    Acked-by: Kyle McMartin
    Acked-by: Martin Schwidefsky
    [ various fixes ]
    Signed-off-by: Huang Ying
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The group_sched_in() function uses a transactional approach to schedule
    a group of events. In a group, either all events can be scheduled or
    none are. To schedule each event in, the function calls event_sched_in().
    In case of error, event_sched_out() is called on each event in the group.

    The problem is that event_sched_out() does not completely cancel the
    effects of event_sched_in(). Furthermore event_sched_out() changes the
    state of the event as if it had run which is not true is this particular
    case.

    Those inconsistencies impact time tracking fields and may lead to events
    in a group not all reporting the same time_enabled and time_running values.
    This is demonstrated with the example below:

    $ task -eunhalted_core_cycles,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5
    1946101 unhalted_core_cycles (32.85% scaling, ena=829181, run=556827)
    11423 baclears (32.85% scaling, ena=829181, run=556827)
    7671 baclears (0.00% scaling, ena=556827, run=556827)

    2250443 unhalted_core_cycles (57.83% scaling, ena=962822, run=405995)
    11705 baclears (57.83% scaling, ena=962822, run=405995)
    11705 baclears (57.83% scaling, ena=962822, run=405995)

    Notice that in the first group, the last baclears event does not
    report the same timings as its siblings.

    This issue comes from the fact that tstamp_stopped is updated
    by event_sched_out() as if the event had actually run.

    To solve the issue, we must ensure that, in case of error, there is
    no change in the event state whatsoever. That means timings must
    remain as they were when entering group_sched_in().

    To do this we defer updating tstamp_running until we know the
    transaction succeeded. Therefore, we have split event_sched_in()
    in two parts separating the update to tstamp_running.

    Similarly, in case of error, we do not want to update tstamp_stopped.
    Therefore, we have split event_sched_out() in two parts separating
    the update to tstamp_stopped.

    With this patch, we now get the following output:

    $ task -eunhalted_core_cycles,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5
    2492050 unhalted_core_cycles (71.75% scaling, ena=1093330, run=308841)
    11243 baclears (71.75% scaling, ena=1093330, run=308841)
    11243 baclears (71.75% scaling, ena=1093330, run=308841)

    1852746 unhalted_core_cycles (0.00% scaling, ena=784489, run=784489)
    9253 baclears (0.00% scaling, ena=784489, run=784489)
    9253 baclears (0.00% scaling, ena=784489, run=784489)

    Note that the uneven timing between groups is a side effect of
    the process spending most of its time sleeping, i.e., not enough
    event rotations (but that's a separate issue).

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • PERF_COUNT_HW_CACHE_DTLB:READ:MISS had a bogus umask value of 0 which
    counts nothing. Needed to be 0x7 (to count all possibilities).

    PERF_COUNT_HW_CACHE_ITLB:READ:MISS had a bogus umask value of 0 which
    counts nothing. Needed to be 0x3 (to count all possibilities).

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Cc: Robert Richter
    Cc: # as far back as it applies
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • You can only call update_context_time() when the context
    is active, i.e., the thread it is attached to is still running.

    However, perf_event_read() can be called even when the context
    is inactive, e.g., user read() the counters. The call to
    update_context_time() must be conditioned on the status of
    the context, otherwise, bogus time_enabled, time_running may
    be returned. Here is an example on AMD64. The task program
    is an example from libpfm4. The -p prints deltas every 1s.

    $ task -p -e cpu_clk_unhalted sleep 5
    2,266,610 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
    5,242,358,071 cpu_clk_unhalted (99.95% scaling, ena=5,000,359,984, run=2,319,270)

    Whereas if you don't read deltas, e.g., no call to perf_event_read() until
    the process terminates:

    $ task -e cpu_clk_unhalted sleep 5
    2,497,783 cpu_clk_unhalted (0.00% scaling, ena=2,376,899, run=2,376,899)

    Notice that time_enable, time_running are bogus in the first example
    causing bogus scaling.

    This patch fixes the problem, by conditionally calling update_context_time()
    in perf_event_read().

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Cc: stable@kernel.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     

17 Oct, 2010

1 commit


16 Oct, 2010

2 commits


15 Oct, 2010

12 commits

  • The file kernel/trace/ftrace.c references the mcount() call to
    convert the mcount() callers to nops. But because it references
    mcount(), the mcount() address is placed in the relocation table.

    The C version of recordmcount reads the relocation table of all
    object files, and it will add all references to mcount to the
    __mcount_loc table that is used to find the places that call mcount()
    and change the call to a nop. When recordmcount finds the mcount reference
    in kernel/trace/ftrace.o, it saves that location even though the code
    is not a call, but references mcount as data.

    On boot up, when all calls are converted to nops, the code has a safety
    check to determine what op code it is actually replacing before it
    replaces it. If that op code at the address does not match, then
    a warning is printed and the function tracer is disabled.

    The reference to mcount in ftrace.c, causes this warning to trigger,
    since the reference is not a call to mcount(). The ftrace.c file is
    not compiled with the -pg flag, so no calls to mcount() should be
    expected.

    This patch simply makes recordmcount.c skip the kernel/trace/ftrace.c
    file. This was the same solution used by the perl version of
    recordmcount.

    Reported-by: Ingo Molnar
    Cc: John Reiser
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Make !CONFIG_PM function stubs static inline and remove section
    attribute.

    Signed-off-by: Robert Richter

    Robert Richter
     
  • Commit e9677b3ce (oprofile, ARM: Use oprofile_arch_exit() to
    cleanup on failure) caused oprofile_perf_exit to be called
    in the cleanup path of oprofile_perf_init. The __exit tag
    for oprofile_perf_exit should therefore be dropped.

    The same has to be done for exit_driverfs as well, as this
    function is called from oprofile_perf_exit. Else, we get
    the following two linker errors.

    LD .tmp_vmlinux1
    `oprofile_perf_exit' referenced in section `.init.text' of arch/arm/oprofile/built-in.o: defined in discarded section `.exit.text' of arch/arm/oprofile/built-in.o
    make: *** [.tmp_vmlinux1] Error 1

    LD .tmp_vmlinux1
    `exit_driverfs' referenced in section `.text' of arch/arm/oprofile/built-in.o: defined in discarded section `.exit.text' of arch/arm/oprofile/built-in.o
    make: *** [.tmp_vmlinux1] Error 1

    Signed-off-by: Anand Gadiyar
    Cc: Will Deacon
    Signed-off-by: Robert Richter

    Anand Gadiyar
     
  • oprofile_perf.c needs to include platform_device.h
    Otherwise we get the following build break.

    CC arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.o
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:192: warning: 'struct platform_device' declared inside parameter list
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:192: warning: its scope is only this definition or declaration, which is probably not what you want
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:201: warning: 'struct platform_device' declared inside parameter list
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:210: error: variable 'oprofile_driver' has initializer but incomplete type
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: unknown field 'driver' specified in initializer
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: extra brace group at end of initializer
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: (near initialization for 'oprofile_driver')
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:213: warning: excess elements in struct initializer
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:213: warning: (near initialization for 'oprofile_driver')
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: error: unknown field 'resume' specified in initializer
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: warning: excess elements in struct initializer
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: warning: (near initialization for 'oprofile_driver')
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: error: unknown field 'suspend' specified in initializer
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: warning: excess elements in struct initializer
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: warning: (near initialization for 'oprofile_driver')
    arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c: In function 'init_driverfs':

    Signed-off-by: Anand Gadiyar
    Cc: Matt Fleming
    Cc: Will Deacon
    Signed-off-by: Robert Richter

    Anand Gadiyar
     
  • Conflicts:
    arch/arm/oprofile/common.c
    kernel/perf_event.c

    Robert Richter
     
  • …nel/git/rostedt/linux-2.6-trace into perf/core

    Ingo Molnar
     
  • The config option used by archs to let the build system know that
    the C version of the recordmcount works for said arch is currently
    called HAVE_C_MCOUNT_RECORD which enables BUILD_C_RECORDMCOUNT. To
    be more consistent with the name that all archs may use, it has been
    renamed to HAVE_C_RECORDMCOUNT. This will be less confusing since
    we are building a C recordmcount and not a mcount_record.

    Suggested-by: Ingo Molnar
    Cc:
    Cc: Michal Marek
    Cc: linux-kbuild@vger.kernel.org
    Cc: John Reiser
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • …ic/random-tracing into perf/core

    Ingo Molnar
     
  • The elf reader for recordmcount.c had duplicate functions for both
    32 bit and 64 bit elf handling. This was due to the need of using
    the 32 and 64 bit elf structures.

    This patch consolidates the two by using macros to define the 32
    and 64 bit names in a recordmcount.h file, and then by just defining
    a RECORD_MCOUNT_64 macro and including recordmcount.h twice we
    create the funtions for both the 32 bit version as well as the
    64 bit version using one code source.

    Cc: John Reiser
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • This patch adds the support for the C version of recordmcount and
    compile times show ~ 12% improvement.

    After verifying this works, other archs can add:

    HAVE_C_MCOUNT_RECORD

    in its Kconfig and it will use the C version of recordmcount
    instead of the perl version.

    Cc:
    Cc: Michal Marek
    Cc: linux-kbuild@vger.kernel.org
    Cc: John Reiser
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Currently, the mcount callers are found with a perl script that does
    an objdump on every file in the kernel. This is a C version of that
    same code which should increase the performance time of compiling
    the kernel with dynamic ftrace enabled.

    Signed-off-by: John Reiser

    [ Updated the code to include .text.unlikely section as well as
    changing the format to follow Linux coding style. ]

    Signed-off-by: Steven Rostedt

    John Reiser
     
  • In x86, faults exit by executing the iret instruction, which then
    reenables NMIs if we faulted in NMI context. Then if a fault
    happens in NMI, another NMI can nest after the fault exits.

    But we don't yet support nested NMIs because we have only one NMI
    stack. To prevent from that, check that vmalloc and kmemcheck
    faults don't happen in this context. Most of the other kernel faults
    in NMIs can be more easily spotted by finding explicit
    copy_from,to_user() calls on review.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Mathieu Desnoyers
    Cc: Peter Zijlstra

    Frederic Weisbecker
     

14 Oct, 2010

6 commits


13 Oct, 2010

1 commit

  • Fix

    kernel/trace/trace_functions_graph.c: In function ‘trace_print_graph_duration’:
    kernel/trace/trace_functions_graph.c:652: warning: comparison of distinct pointer types lacks a cast

    when building 36-rc6 on a 32-bit due to the strict type check failing
    in the min() macro.

    Signed-off-by: Borislav Petkov
    Cc: Chase Douglas
    Cc: Steven Rostedt
    Cc: Ingo Molnar
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Borislav Petkov
     

12 Oct, 2010

3 commits