21 Oct, 2010

1 commit

  • With the addition of trace_softirq_raise() the softirq tracepoint got
    even more convoluted. Why the tracepoints take two pointers to assign
    an integer is beyond my comprehension.

    But adding an extra case which treats the first pointer as an unsigned
    long when the second pointer is NULL including the back and forth
    type casting is just horrible.

    Convert the softirq tracepoints to take a single unsigned int argument
    for the softirq vector number and fix the call sites.

    Signed-off-by: Thomas Gleixner
    LKML-Reference:
    Acked-by: Peter Zijlstra
    Acked-by: mathieu.desnoyers@efficios.com
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt

    Thomas Gleixner
     

20 Oct, 2010

1 commit


19 Oct, 2010

11 commits

  • The function start_func_tracer() was incorrectly added in the
    #ifdef CONFIG_FUNCTION_TRACER condition, but is still used even
    when function tracing is not enabled.

    The calls to register_ftrace_function() and register_ftrace_graph()
    become nops (and their arguments are even ignored), thus there is
    no reason to hide start_func_tracer() when function tracing is
    not enabled.

    Reported-by: Ingo Molnar
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Acked-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Trades a call + conditional + ret for an unconditional jmp.

    Acked-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Now that there's still only a few users around, rename things to make
    them more consistent.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • hw_breakpoint creation needs to account stuff per-task to ensure there
    is always sufficient hardware resources to back these things due to
    ptrace.

    With the perf per pmu context changes the event initialization no
    longer has access to the event context, for the simple reason that we
    need to first find the pmu (result of initialization) before we can
    find the context.

    This makes hw_breakpoints unhappy, because it can no longer do per
    task accounting, cure this by frobbing a task pointer in the event::hw
    bits for now...

    Signed-off-by: Peter Zijlstra
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • So that we can pass the task pointer to the event allocation, so that
    we can use task associated data during event initialization.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Currently it looks like find_lively_task_by_vpid() takes a task ref
    and relies on find_get_context() to drop it.

    The problem is that perf_event_create_kernel_counter() shouldn't be
    dropping task refs.

    Signed-off-by: Peter Zijlstra
    Acked-by: Frederic Weisbecker
    Acked-by: Matt Helsley
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Matt found we trigger the WARN_ON_ONCE() in perf_group_attach() when we take
    the move_group path in perf_event_open().

    Since we cannot de-construct the group (we rely on it to move the events), we
    have to simply ignore the double attach. The group state is context invariant
    and doesn't need changing.

    Reported-by: Matt Fleming
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Provide a mechanism that allows running code in IRQ context. It is
    most useful for NMI code that needs to interact with the rest of the
    system -- like wakeup a task to drain buffers.

    Perf currently has such a mechanism, so extract that and provide it as
    a generic feature, independent of perf so that others may also
    benefit.

    The IRQ context callback is generated through self-IPIs where
    possible, or on architectures like powerpc the decrementer (the
    built-in timer facility) is set to generate an interrupt immediately.

    Architectures that don't have anything like this get to do with a
    callback from the timer tick. These architectures can call
    irq_work_run() at the tail of any IRQ handlers that might enqueue such
    work (like the perf IRQ handler) to avoid undue latencies in
    processing the work.

    Signed-off-by: Peter Zijlstra
    Acked-by: Kyle McMartin
    Acked-by: Martin Schwidefsky
    [ various fixes ]
    Signed-off-by: Huang Ying
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The group_sched_in() function uses a transactional approach to schedule
    a group of events. In a group, either all events can be scheduled or
    none are. To schedule each event in, the function calls event_sched_in().
    In case of error, event_sched_out() is called on each event in the group.

    The problem is that event_sched_out() does not completely cancel the
    effects of event_sched_in(). Furthermore event_sched_out() changes the
    state of the event as if it had run which is not true is this particular
    case.

    Those inconsistencies impact time tracking fields and may lead to events
    in a group not all reporting the same time_enabled and time_running values.
    This is demonstrated with the example below:

    $ task -eunhalted_core_cycles,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5
    1946101 unhalted_core_cycles (32.85% scaling, ena=829181, run=556827)
    11423 baclears (32.85% scaling, ena=829181, run=556827)
    7671 baclears (0.00% scaling, ena=556827, run=556827)

    2250443 unhalted_core_cycles (57.83% scaling, ena=962822, run=405995)
    11705 baclears (57.83% scaling, ena=962822, run=405995)
    11705 baclears (57.83% scaling, ena=962822, run=405995)

    Notice that in the first group, the last baclears event does not
    report the same timings as its siblings.

    This issue comes from the fact that tstamp_stopped is updated
    by event_sched_out() as if the event had actually run.

    To solve the issue, we must ensure that, in case of error, there is
    no change in the event state whatsoever. That means timings must
    remain as they were when entering group_sched_in().

    To do this we defer updating tstamp_running until we know the
    transaction succeeded. Therefore, we have split event_sched_in()
    in two parts separating the update to tstamp_running.

    Similarly, in case of error, we do not want to update tstamp_stopped.
    Therefore, we have split event_sched_out() in two parts separating
    the update to tstamp_stopped.

    With this patch, we now get the following output:

    $ task -eunhalted_core_cycles,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5
    2492050 unhalted_core_cycles (71.75% scaling, ena=1093330, run=308841)
    11243 baclears (71.75% scaling, ena=1093330, run=308841)
    11243 baclears (71.75% scaling, ena=1093330, run=308841)

    1852746 unhalted_core_cycles (0.00% scaling, ena=784489, run=784489)
    9253 baclears (0.00% scaling, ena=784489, run=784489)
    9253 baclears (0.00% scaling, ena=784489, run=784489)

    Note that the uneven timing between groups is a side effect of
    the process spending most of its time sleeping, i.e., not enough
    event rotations (but that's a separate issue).

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • You can only call update_context_time() when the context
    is active, i.e., the thread it is attached to is still running.

    However, perf_event_read() can be called even when the context
    is inactive, e.g., user read() the counters. The call to
    update_context_time() must be conditioned on the status of
    the context, otherwise, bogus time_enabled, time_running may
    be returned. Here is an example on AMD64. The task program
    is an example from libpfm4. The -p prints deltas every 1s.

    $ task -p -e cpu_clk_unhalted sleep 5
    2,266,610 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
    5,242,358,071 cpu_clk_unhalted (99.95% scaling, ena=5,000,359,984, run=2,319,270)

    Whereas if you don't read deltas, e.g., no call to perf_event_read() until
    the process terminates:

    $ task -e cpu_clk_unhalted sleep 5
    2,497,783 cpu_clk_unhalted (0.00% scaling, ena=2,376,899, run=2,376,899)

    Notice that time_enable, time_running are bogus in the first example
    causing bogus scaling.

    This patch fixes the problem, by conditionally calling update_context_time()
    in perf_event_read().

    Signed-off-by: Stephane Eranian
    Signed-off-by: Peter Zijlstra
    Cc: stable@kernel.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     

18 Oct, 2010

7 commits

  • Even though the parent is recorded with the normal function tracing
    of the latency tracers (irqsoff and wakeup), the function graph
    recording is bogus.

    This is due to the function graph messing with the return stack.
    The latency tracers pass in as the parent CALLER_ADDR0, which
    works fine for plain function tracing. But this causes bogus output
    with the graph tracer:

    3) -0 | d.s3. 0.000 us | return_to_handler();
    3) -0 | d.s3. 0.000 us | _raw_spin_unlock_irqrestore();
    3) -0 | d.s3. 0.000 us | return_to_handler();
    3) -0 | d.s3. 0.000 us | trace_hardirqs_on();

    The "return_to_handle()" call is the trampoline of the
    function graph tracer, and is meaningless in this context.

    Cc: Jiri Olsa
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The preempt and irqsoff tracers have three types of function tracers.
    Normal function tracer, function graph entry, and function graph return.
    Each of these use a complex dance to prevent recursion and whether
    to trace the data or not (depending if interrupts are enabled or not).

    This patch moves the duplicate code into a single routine, to
    prevent future mistakes with modifying duplicate complex code.

    Cc: Jiri Olsa
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The wakeup tracer has three types of function tracers. Normal
    function tracer, function graph entry, and function graph return.
    Each of these use a complex dance to prevent recursion and whether
    to trace the data or not (depending on the wake_task variable).

    This patch moves the duplicate code into a single routine, to
    prevent future mistakes with modifying duplicate complex code.

    Cc: Jiri Olsa
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Add function graph support for wakeup latency tracer.
    The graph output is enabled by setting the 'display-graph'
    trace option.

    Signed-off-by: Jiri Olsa
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Jiri Olsa
     
  • Move trace_graph_function() and print_graph_headers_flags() functions
    to the trace_function_graph.c to be globaly available.

    Signed-off-by: Jiri Olsa
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Jiri Olsa
     
  • The check_irq_entry and check_irq_return could be called
    from graph event context. In such case there's no graph
    private data allocated. Adding checks to handle this case.

    Signed-off-by: Jiri Olsa
    LKML-Reference:

    [ Fixed some grammar in the comments ]

    Signed-off-by: Steven Rostedt

    Jiri Olsa
     
  • Unnecessary cast from void* in assignment.

    Signed-off-by: matt mooney
    Signed-off-by: Steven Rostedt

    matt mooney
     

15 Oct, 2010

5 commits


14 Oct, 2010

1 commit


13 Oct, 2010

1 commit

  • Fix

    kernel/trace/trace_functions_graph.c: In function ‘trace_print_graph_duration’:
    kernel/trace/trace_functions_graph.c:652: warning: comparison of distinct pointer types lacks a cast

    when building 36-rc6 on a 32-bit due to the strict type check failing
    in the min() macro.

    Signed-off-by: Borislav Petkov
    Cc: Chase Douglas
    Cc: Steven Rostedt
    Cc: Ingo Molnar
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Borislav Petkov
     

12 Oct, 2010

1 commit


11 Oct, 2010

1 commit

  • Introduce perf_pmu_name() helper function that returns the name of the
    pmu. This gives us a generic way to get the name of a pmu regardless of
    how an architecture identifies it internally.

    Signed-off-by: Matt Fleming
    Acked-by: Peter Zijlstra
    Acked-by: Paul Mundt
    Signed-off-by: Robert Richter

    Matt Fleming
     

08 Oct, 2010

1 commit


06 Oct, 2010

2 commits

  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    rcu: rcu_read_lock_bh_held(): disabling irqs also disables bh
    generic-ipi: Fix deadlock in __smp_call_function_single

    Linus Torvalds
     
  • With all the recent module loading cleanups, we've minimized the code
    that sits under module_mutex, fixing various deadlocks and making it
    possible to do most of the module loading in parallel.

    However, that whole conversion totally missed the rather obscure code
    that adds a new module to the list for BUG() handling. That code was
    doubly obscure because (a) the code itself lives in lib/bugs.c (for
    dubious reasons) and (b) it gets called from the architecture-specific
    "module_finalize()" rather than from generic code.

    Calling it from arch-specific code makes no sense what-so-ever to begin
    with, and is now actively wrong since that code isn't protected by the
    module loading lock any more.

    So this commit moves the "module_bug_{finalize,cleanup}()" calls away
    from the arch-specific code, and into the generic code - and in the
    process protects it with the module_mutex so that the list operations
    are now safe.

    Future fixups:
    - move the module list handling code into kernel/module.c where it
    belongs.
    - get rid of 'module_bug_list' and just use the regular list of modules
    (called 'modules' - imagine that) that we already create and maintain
    for other reasons.

    Reported-and-tested-by: Thomas Gleixner
    Cc: Rusty Russell
    Cc: Adrian Bunk
    Cc: Andrew Morton
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

04 Oct, 2010

1 commit

  • This patch fixes an error in perf_event_open() when the pid
    provided by the user is invalid. find_lively_task_by_vpid()
    does not return NULL on error but an error code. Without the
    fix the error code was silently passed to find_get_context()
    which would eventually cause a invalid pointer dereference.

    Signed-off-by: Stephane Eranian
    Cc: peterz@infradead.org
    Cc: paulus@samba.org
    Cc: davem@davemloft.net
    Cc: fweisbec@gmail.com
    Cc: perfmon2-devel@lists.sf.net
    Cc: eranian@gmail.com
    Cc: robert.richter@amd.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     

02 Oct, 2010

1 commit

  • The kfifo_dma family of functions use sg_mark_end() on the last element in
    their scatterlist. This forces use of a fresh scatterlist for each DMA
    operation, which makes recycling a single scatterlist impossible.

    Change the behavior of the kfifo_dma functions to match the usage of the
    dma_map_sg function. This means that users must respect the returned
    nents value. The sample code is updated to reflect the change.

    This bug is trivial to cause: call kfifo_dma_in_prepare() such that it
    prepares a scatterlist with a single entry comprising the whole fifo.
    This is the case when you map the entirety of a newly created empty fifo.
    This causes the setup_sgl() function to mark the first scatterlist entry
    as the end of the chain, no matter what comes after it.

    Afterwards, add and remove some data from the fifo such that another call
    to kfifo_dma_in_prepare() will create two scatterlist entries. It returns
    nents=2. However, due to the previous sg_mark_end() call, sg_is_last()
    will now return true for the first scatterlist element. This causes the
    sample code to print a single scatterlist element when it should print
    two.

    By removing the call to sg_mark_end(), we make the API as similar as
    possible to the DMA mapping API. All users are required to respect the
    returned nents.

    Signed-off-by: Ira W. Snyder
    Cc: Stefani Seibold
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ira W. Snyder
     

24 Sep, 2010

1 commit


23 Sep, 2010

5 commits

  • The below bug in fork led to the rmap walk finding the parent huge-pmd
    twice instead of just once, because the anon_vma_chain objects of the
    child vma still point to the vma->vm_mm of the parent.

    The patch fixes it by making the rmap walk accurate during fork. It's not
    a big deal normally but it worth being accurate considering the cost is
    the same.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Johannes Weiner
    Acked-by: Rik van Riel
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Make use of the jump label infrastructure for tracepoints.

    Signed-off-by: Jason Baron
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Jason Baron
     
  • Add a jump_label_text_reserved(void *start, void *end), so that other
    pieces of code that want to modify kernel text, can first verify that
    jump label has not reserved the instruction.

    Acked-by: Masami Hiramatsu
    Signed-off-by: Jason Baron
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Jason Baron
     
  • Initialize the workqueue data structures *before* they are registered
    so that they are ready for callbacks.

    Signed-off-by: Jason Baron
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Jason Baron
     
  • base patch to implement 'jump labeling'. Based on a new 'asm goto' inline
    assembly gcc mechanism, we can now branch to labels from an 'asm goto'
    statment. This allows us to create a 'no-op' fastpath, which can subsequently
    be patched with a jump to the slowpath code. This is useful for code which
    might be rarely used, but which we'd like to be able to call, if needed.
    Tracepoints are the current usecase that these are being implemented for.

    Acked-by: David S. Miller
    Signed-off-by: Jason Baron
    LKML-Reference:

    [ cleaned up some formating ]

    Signed-off-by: Steven Rostedt

    Jason Baron