23 Mar, 2016

1 commit

  • The latency tracer format has a nice column to indicate IRQ state, but
    this is not able to tell us about NMI state.

    When tracing perf interrupt handlers (which often run in NMI context)
    it is very useful to see how the events nest.

    Link: http://lkml.kernel.org/r/20160318153022.105068893@infradead.org

    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Steven Rostedt

    Peter Zijlstra
     

01 Oct, 2015

1 commit

  • In preparation to make trace options per instance, the global trace_flags
    needs to be moved from being a global variable to a field within the trace
    instance trace_array structure.

    There's still more work to do, as there's some functions that use
    trace_flags without passing in a way to get to the current_trace array. For
    those, the global_trace is used directly (from trace.c). This includes
    setting and clearing the trace_flags. This means that when a new instance is
    created, it just gets the trace_flags of the global_trace and will not be
    able to modify them. Depending on the functions that have access to the
    trace_array, the flags of an instance may not affect parts of its trace,
    where the global_trace is used. These will be fixed in future changes.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

28 Sep, 2015

2 commits


21 Jul, 2015

1 commit

  • A fine granulity support for delay would be very useful when profiling
    VM logics, such as page allocation including page reclaim and memory
    compaction with function graph.

    Thus, this patch adds two additional marks with two changes.

    - An equal sign in mark selection function is removed to align code
    behavior with comments and documentation.

    - The function graph example related to delay in ftrace.txt is updated
    to cover all supported marks.

    Link: http://lkml.kernel.org/r/1436626300-1679-3-git-send-email-jungseoklee85@gmail.com

    Cc: Byungchul Park
    Signed-off-by: Jungseok Lee
    Signed-off-by: Steven Rostedt

    Jungseok Lee
     

14 May, 2015

6 commits


07 May, 2015

1 commit

  • The only caller to this function (__print_array) was getting it wrong by
    passing the array length instead of buffer length. As the element size
    was already being passed for other reasons it seems reasonable to push
    the calculation of buffer length into the function.

    Link: http://lkml.kernel.org/r/1430320727-14582-1-git-send-email-alex.bennee@linaro.org

    Signed-off-by: Alex Bennée
    Signed-off-by: Steven Rostedt

    Alex Bennée
     

28 Jan, 2015

1 commit

  • If a trace event contains an array, there is currently no standard
    way to format this for text output. Drivers are currently hacking
    around this by a) local hacks that use the trace_seq functionailty
    directly, or b) just not printing that information. For fixed size
    arrays, formatting of the elements can be open-coded, but this gets
    cumbersome for arrays of non-trivial size.

    These approaches result in non-standard content of the event format
    description delivered to userspace, so userland tools needs to be
    taught to understand and parse each array printing method
    individually.

    This patch implements a __print_array() helper that tracepoint
    implementations can use instead of reinventing it. A simple C-style
    syntax is used to delimit the array and its elements {like,this}.

    So that the helpers can be used with large static arrays as well as
    dynamic arrays, they take a pointer and element count: they can be
    used with __get_dynamic_array() for use with dynamic arrays.
    Link: http://lkml.kernel.org/r/1422449335-8289-2-git-send-email-javi.merino@arm.com

    Cc: Ingo Molnar
    Signed-off-by: Dave Martin
    Signed-off-by: Javi Merino
    Signed-off-by: Steven Rostedt

    Dave Martin
     

04 Dec, 2014

1 commit

  • Currently, function graph tracer prints "!" or "+" just before
    function execution time to signal a function overhead, depending
    on the time. And some tracers tracing latency also print "!" or
    "+" just after time to signal overhead, depending on the interval
    between events. Even it is usually enough to do that, we sometimes
    need to signal for bigger execution time than 100 micro seconds.

    For example, I used function graph tracer to detect if there is
    any case that exit_mm() takes too much time. I did following steps
    in /sys/kernel/debug/tracing. It was easier to detect very large
    excution time with patched kernel than with original kernel.

    $ echo exit_mm > set_graph_function
    $ echo function_graph > current_tracer
    $ echo > trace
    $ cat trace_pipe > $LOGFILE
    ... (do something and terminate logging)
    $ grep "\\$" $LOGFILE
    3) $ 22082032 us | } /* kernel_map_pages */
    3) $ 22082040 us | } /* free_pages_prepare */
    3) $ 22082113 us | } /* free_hot_cold_page */
    3) $ 22083455 us | } /* free_hot_cold_page_list */
    3) $ 22083895 us | } /* release_pages */
    3) $ 22177873 us | } /* free_pages_and_swap_cache */
    3) $ 22178929 us | } /* unmap_single_vma */
    3) $ 22198885 us | } /* unmap_vmas */
    3) $ 22206949 us | } /* exit_mmap */
    3) $ 22207659 us | } /* mmput */
    3) $ 22207793 us | } /* exit_mm */

    And then, it was easy to find out that a schedule-out occured by
    sub_preempt_count() within kernel_map_pages().

    To detect very large function exection time caused by either problematic
    function implementation or scheduling issues, this patch can be useful.

    Link: http://lkml.kernel.org/r/1416789259-24038-1-git-send-email-byungchul.park@lge.com

    Signed-off-by: Byungchul Park
    Signed-off-by: Steven Rostedt

    Byungchul Park
     

20 Nov, 2014

2 commits

  • If the trace_seq of ftrace_raw_output_prep() is full this function
    returns TRACE_TYPE_PARTIAL_LINE, otherwise it returns zero.

    The problem is that TRACE_TYPE_PARTIAL_LINE happens to be zero!

    The thing is, the caller of ftrace_raw_output_prep() expects a
    success to be zero. Change that to expect it to be
    TRACE_TYPE_HANDLED.

    Link: http://lkml.kernel.org/r/20141114112522.GA2988@dhcp128.suse.cz

    Reminded-by: Petr Mladek
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Adding a trace_seq_has_overflowed() which returns true if the trace_seq
    had too much written into it allows us to simplify the code.

    Instead of checking the return value of every call to trace_seq_printf()
    and friends, they can all be called normally, and at the end we can
    return !trace_seq_has_overflowed() instead.

    Several functions also return TRACE_TYPE_PARTIAL_LINE when the trace_seq
    overflowed and TRACE_TYPE_HANDLED otherwise. Another helper function
    was created called trace_handle_return() which takes a trace_seq and
    returns these enums. Using this helper function also simplifies the
    code.

    This change also makes it possible to remove the return values of
    trace_seq_printf() and friends. They should instead just be
    void functions.

    Link: http://lkml.kernel.org/r/20141114011410.365183157@goodmis.org

    Reviewed-by: Petr Mladek
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

12 Nov, 2014

1 commit


01 Jul, 2014

2 commits


15 May, 2014

1 commit

  • Being able to show a cpumask of events can be useful as some events
    may affect only some CPUs. There is no standard way to record the
    cpumask and converting it to a string is rather expensive during
    the trace as traces happen in hotpaths. It would be better to record
    the raw event mask and be able to parse it at print time.

    The following macros were added for use with the TRACE_EVENT() macro:

    __bitmask()
    __assign_bitmask()
    __get_bitmask()

    To test this, I added this to the sched_migrate_task event, which
    looked like this:

    TRACE_EVENT(sched_migrate_task,

    TP_PROTO(struct task_struct *p, int dest_cpu, const struct cpumask *cpus),

    TP_ARGS(p, dest_cpu, cpus),

    TP_STRUCT__entry(
    __array( char, comm, TASK_COMM_LEN )
    __field( pid_t, pid )
    __field( int, prio )
    __field( int, orig_cpu )
    __field( int, dest_cpu )
    __bitmask( cpumask, num_possible_cpus() )
    ),

    TP_fast_assign(
    memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
    __entry->pid = p->pid;
    __entry->prio = p->prio;
    __entry->orig_cpu = task_cpu(p);
    __entry->dest_cpu = dest_cpu;
    __assign_bitmask(cpumask, cpumask_bits(cpus), num_possible_cpus());
    ),

    TP_printk("comm=%s pid=%d prio=%d orig_cpu=%d dest_cpu=%d cpumask=%s",
    __entry->comm, __entry->pid, __entry->prio,
    __entry->orig_cpu, __entry->dest_cpu,
    __get_bitmask(cpumask))
    );

    With the output of:

    ksmtuned-3613 [003] d..2 485.220508: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=3 dest_cpu=2 cpumask=00000000,0000000f
    migration/1-13 [001] d..5 485.221202: sched_migrate_task: comm=ksmtuned pid=3614 prio=120 orig_cpu=1 dest_cpu=0 cpumask=00000000,0000000f
    awk-3615 [002] d.H5 485.221747: sched_migrate_task: comm=rcu_preempt pid=7 prio=120 orig_cpu=0 dest_cpu=1 cpumask=00000000,000000ff
    migration/2-18 [002] d..5 485.222062: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=2 dest_cpu=3 cpumask=00000000,0000000f

    Link: http://lkml.kernel.org/r/1399377998-14870-6-git-send-email-javi.merino@arm.com
    Link: http://lkml.kernel.org/r/20140506132238.22e136d1@gandalf.local.home

    Suggested-by: Javi Merino
    Tested-by: Javi Merino
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

09 Apr, 2014

1 commit

  • Register/unregister tracepoint probes with struct tracepoint pointer
    rather than tracepoint name.

    This change, which vastly simplifies tracepoint.c, has been proposed by
    Steven Rostedt. It also removes 8.8kB (mostly of text) to the vmlinux
    size.

    From this point on, the tracers need to pass a struct tracepoint pointer
    to probe register/unregister. A probe can now only be connected to a
    tracepoint that exists. Moreover, tracers are responsible for
    unregistering the probe before the module containing its associated
    tracepoint is unloaded.

    text data bss dec hex filename
    10443444 4282528 10391552 25117524 17f4354 vmlinux.orig
    10434930 4282848 10391552 25109330 17f2352 vmlinux

    Link: http://lkml.kernel.org/r/1396992381-23785-2-git-send-email-mathieu.desnoyers@efficios.com

    CC: Ingo Molnar
    CC: Frederic Weisbecker
    CC: Andrew Morton
    CC: Frank Ch. Eigler
    CC: Johannes Berg
    Signed-off-by: Mathieu Desnoyers
    [ SDR - fixed return val in void func in tracepoint_module_going() ]
    Signed-off-by: Steven Rostedt

    Mathieu Desnoyers
     

22 Mar, 2014

1 commit

  • I originally wrote commit 35bb4399bd0e to shrink the size of the overhead of
    tracepoints by several kilobytes. Later, I received a patch from Vaibhav
    Nagarnaik that fixed a bug in the same code that this commit touches. Not
    only did it fix a bug, it also removed code and shrunk the size of the
    overhead of trace events even more than this commit did.

    Since this commit is scheduled for 3.15 and Vaibhav's patch is already in
    mainline, I need to revert this patch in order to keep it from conflicting
    with Vaibhav's patch. Not to mention, Vaibhav's patch makes this patch
    obsolete.

    Link: http://lkml.kernel.org/r/20140320225637.0226041b@gandalf.local.home

    Cc: Vaibhav Nagarnaik
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

07 Mar, 2014

2 commits

  • The code that shows array fields for events is defined for all events.
    This can add up quite a bit when you have over 500 events.

    By making helper functions in the core kernel to do the work
    instead, we can shrink the size of the kernel down a bit.

    With a kernel configured with 502 events, the change in size was:

    text data bss dec hex filename
    12990946 1913568 9785344 24689858 178bcc2 /tmp/vmlinux
    12987390 1913504 9785344 24686238 178ae9e /tmp/vmlinux.patched

    That's a total of 3556 bytes, which comes down to 7 bytes per event.
    Although it's not much, this code is just called at initialization of
    the events.

    Link: http://lkml.kernel.org/r/20120810034708.084036335@goodmis.org

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The code for trace events to format the raw recorded event data
    into human readable format in the 'trace' file is repeated for every
    event in the system. When you have over 500 events, this can add up
    quite a bit.

    By making helper functions in the core kernel to do the work
    instead, we can shrink the size of the kernel down a bit.

    With a kernel configured with 502 events, the change in size was:

    text data bss dec hex filename
    12991007 1913568 9785344 24689919 178bcff /tmp/vmlinux.orig
    12990946 1913568 9785344 24689858 178bcc2 /tmp/vmlinux.patched

    Note, this version does not save as much as the version of this patch
    I had a few years ago. That is because in the mean time, commit
    f71130de5c7f ("tracing: Add a helper function for event print functions")
    did a lot of the work my original patch did. But this change helps
    slightly, and is part of a larger clean up to reduce the size much further.

    Link: http://lkml.kernel.org/r/20120810034707.378538034@goodmis.org

    Cc: Li Zefan
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

11 Nov, 2013

1 commit

  • Since the introduction of PREEMPT_NEED_RESCHED in:

    f27dde8deef3 ("sched: Add NEED_RESCHED to the preempt_count")

    we need to be able to look at both TIF_NEED_RESCHED and
    PREEMPT_NEED_RESCHED to understand the full preemption behaviour.

    Add it to the trace output.

    Signed-off-by: Peter Zijlstra
    Acked-by: Steven Rostedt
    Cc: Fengguang Wu
    Cc: Huang Ying
    Cc: Yuanhan Liu
    Link: http://lkml.kernel.org/r/20131004152826.GP3081@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

19 Jul, 2013

1 commit


30 Apr, 2013

1 commit

  • Pull tracing updates from Steven Rostedt:
    "Along with the usual minor fixes and clean ups there are a few major
    changes with this pull request.

    1) Multiple buffers for the ftrace facility

    This feature has been requested by many people over the last few
    years. I even heard that Google was about to implement it themselves.
    I finally had time and cleaned up the code such that you can now
    create multiple instances of the ftrace buffer and have different
    events go to different buffers. This way, a low frequency event will
    not be lost in the noise of a high frequency event.

    Note, currently only events can go to different buffers, the tracers
    (ie function, function_graph and the latency tracers) still can only
    be written to the main buffer.

    2) The function tracer triggers have now been extended.

    The function tracer had two triggers. One to enable tracing when a
    function is hit, and one to disable tracing. Now you can record a
    stack trace on a single (or many) function(s), take a snapshot of the
    buffer (copy it to the snapshot buffer), and you can enable or disable
    an event to be traced when a function is hit.

    3) A perf clock has been added.

    A "perf" clock can be chosen to be used when tracing. This will cause
    ftrace to use the same clock as perf uses, and hopefully this will
    make it easier to interleave the perf and ftrace data for analysis."

    * tag 'trace-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (82 commits)
    tracepoints: Prevent null probe from being added
    tracing: Compare to 1 instead of zero for is_signed_type()
    tracing: Remove obsolete macro guard _TRACE_PROFILE_INIT
    ftrace: Get rid of ftrace_profile_bits
    tracing: Check return value of tracing_init_dentry()
    tracing: Get rid of unneeded key calculation in ftrace_hash_move()
    tracing: Reset ftrace_graph_filter_enabled if count is zero
    tracing: Fix off-by-one on allocating stat->pages
    kernel: tracing: Use strlcpy instead of strncpy
    tracing: Update debugfs README file
    tracing: Fix ftrace_dump()
    tracing: Rename trace_event_mutex to trace_event_sem
    tracing: Fix comment about prefix in arch_syscall_match_sym_name()
    tracing: Convert trace_destroy_fields() to static
    tracing: Move find_event_field() into trace_events.c
    tracing: Use TRACE_MAX_PRINT instead of constant
    tracing: Use pr_warn_once instead of open coded implementation
    ring-buffer: Add ring buffer startup selftest
    tracing: Bring Documentation/trace/ftrace.txt up to date
    tracing: Add "perf" trace_clock
    ...

    Conflicts:
    kernel/trace/ftrace.c
    kernel/trace/trace.c

    Linus Torvalds
     

16 Mar, 2013

1 commit


15 Mar, 2013

3 commits

  • The trace_printk() is extremely fast and is very handy as it can be
    used in any context (including NMIs!). But it still requires scanning
    the fmt string for parsing the args. Even the trace_bprintk() requires
    a scan to know what args will be saved, although it doesn't copy the
    format string itself.

    Several times trace_printk() has no args, and wastes cpu cycles scanning
    the fmt string.

    Adding trace_puts() allows the developer to use an even faster
    tracing method that only saves the pointer to the string in the
    ring buffer without doing any format parsing at all. This will
    help remove even more of the "Heisenbug" effect, when debugging.

    Also fixed up the F_printk()s for the ftrace internal bprint and print events.

    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Currently, the way the latency tracers and snapshot feature works
    is to have a separate trace_array called "max_tr" that holds the
    snapshot buffer. For latency tracers, this snapshot buffer is used
    to swap the running buffer with this buffer to save the current max
    latency.

    The only items needed for the max_tr is really just a copy of the buffer
    itself, the per_cpu data pointers, the time_start timestamp that states
    when the max latency was triggered, and the cpu that the max latency
    was triggered on. All other fields in trace_array are unused by the
    max_tr, making the max_tr mostly bloat.

    This change removes the max_tr completely, and adds a new structure
    called trace_buffer, that holds the buffer pointer, the per_cpu data
    pointers, the time_start timestamp, and the cpu where the latency occurred.

    The trace_array, now has two trace_buffers, one for the normal trace and
    one for the max trace or snapshot. By doing this, not only do we remove
    the bloat from the max_trace but the instances of traces can now use
    their own snapshot feature and not have just the top level global_trace have
    the snapshot feature and latency tracers for itself.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Move duplicate code in event print functions to a helper function.

    This shrinks the size of the kernel by ~13K.

    text data bss dec hex filename
    6596137 1743966 10138672 18478775 119f6b7 vmlinux.o.old
    6583002 1743849 10138672 18465523 119c2f3 vmlinux.o.new

    Link: http://lkml.kernel.org/r/51258746.2060304@huawei.com

    Signed-off-by: Li Zefan
    Signed-off-by: Steven Rostedt

    Li Zefan
     

28 Feb, 2013

1 commit

  • I'm not sure why, but the hlist for each entry iterators were conceived

    list_for_each_entry(pos, head, member)

    The hlist ones were greedy and wanted an extra parameter:

    hlist_for_each_entry(tpos, pos, head, member)

    Why did they need an extra pos parameter? I'm not quite sure. Not only
    they don't really need it, it also prevents the iterator from looking
    exactly like the list iterator, which is unfortunate.

    Besides the semantic patch, there was some manual work required:

    - Fix up the actual hlist iterators in linux/list.h
    - Fix up the declaration of other iterators based on the hlist ones.
    - A very small amount of places were using the 'node' parameter, this
    was modified to use 'obj->member' instead.
    - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
    properly, so those had to be fixed up manually.

    The semantic patch which is mostly the work of Peter Senna Tschudin is here:

    @@
    iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

    type T;
    expression a,c,d,e;
    identifier b;
    statement S;
    @@

    -T b;

    [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
    [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix warnings]
    [akpm@linux-foudnation.org: redo intrusive kvm changes]
    Tested-by: Peter Senna Tschudin
    Acked-by: Paul E. McKenney
    Signed-off-by: Sasha Levin
    Cc: Wu Fengguang
    Cc: Marcelo Tosatti
    Cc: Gleb Natapov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     

14 Nov, 2012

1 commit

  • With the addition of the "tsc" clock, formatting timestamps to look like
    fractional seconds is misleading. Mark clocks as either in nanoseconds or
    not, and format non-nanosecond timestamps as decimal integers.

    Tested:
    $ cd /sys/kernel/debug/tracing/
    $ cat trace_clock
    [local] global tsc
    $ echo sched_switch > set_event
    $ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on
    $ cat trace
    -0 [000] 6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
    sleep-29964 [000] 6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
    ...
    $ echo 1 > options/latency-format
    $ cat trace
    -0 0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
    sleep-29964 0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
    ...
    $ echo tsc > trace_clock
    $ cat trace
    $ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on
    $ echo 0 > options/latency-format
    $ cat trace
    -0 [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
    sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
    ...
    echo 1 > options/latency-format
    $ cat trace
    -0 0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
    sleep-31128 0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
    ...

    v2:
    Move arch-specific bits out of generic code.
    v4:
    Fix x86_32 build due to 64-bit division.

    Google-Bug-Id: 6980623
    Link: http://lkml.kernel.org/r/1352837903-32191-2-git-send-email-dhsharp@google.com

    Cc: Masami Hiramatsu
    Signed-off-by: David Sharp
    Signed-off-by: Steven Rostedt

    David Sharp
     

15 Jun, 2012

1 commit

  • All trace events including ftrace internel events (like trace_printk
    and function tracing), register functions that describe how to print
    their output. The events may be recorded as soon as the ring buffer
    is allocated, but they are just raw binary in the buffer. The mapping
    of event ids to how to print them are held within a structure that
    is registered on system boot.

    If a crash happens in boot up before these functions are registered
    then their output (via ftrace_dump_on_oops) will be useless:

    Dumping ftrace buffer:
    ---------------------------------
    -1 0.... 319705us : Unknown type 6
    ---------------------------------

    This can be quite frustrating for a kernel developer trying to see
    what is going wrong.

    There's no reason to register them so late in the boot up process.
    They can be registered by early_initcall().

    Reported-by: Peter Zijlstra
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

20 Apr, 2012

1 commit

  • While debugging a latency with someone on IRC (mirage335) on #linux-rt (OFTC),
    we discovered that the stacktrace output of the latency tracers
    (preemptirqsoff) was empty.

    This bug was caused by the creation of the dynamic length stack trace
    again (like commit 12b5da3 "tracing: Fix ent_size in trace output" was).

    This bug is caused by the latency tracers requiring the next event
    to determine the time between the current event and the next. But by
    grabbing the next event, the iter->ent_size is set to the next event
    instead of the current one. As the stacktrace event is the last event,
    this makes the ent_size zero and causes nothing to be printed for
    the stack trace. The dynamic stacktrace uses the ent_size to determine
    how much of the stack can be printed. The ent_size of zero means
    no stack.

    The simple fix is to save the iter->ent_size before finding the next event.

    Note, mirage335 asked to remain anonymous from LKML and git, so I will
    not add the Reported-by and Tested-by tags, even though he did report
    the issue and tested the fix.

    Cc: stable@vger.kernel.org # 3.1+
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

22 Mar, 2012

1 commit

  • Pull vfs pile 1 from Al Viro:
    "This is _not_ all; in particular, Miklos' and Jan's stuff is not there
    yet."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
    ext4: initialization of ext4_li_mtx needs to be done earlier
    debugfs-related mode_t whack-a-mole
    hfsplus: add an ioctl to bless files
    hfsplus: change finder_info to u32
    hfsplus: initialise userflags
    qnx4: new helper - try_extent()
    qnx4: get rid of qnx4_bread/qnx4_getblk
    take removal of PF_FORKNOEXEC to flush_old_exec()
    trim includes in inode.c
    um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
    um: embed ->stub_pages[] into mmu_context
    gadgetfs: list_for_each_safe() misuse
    ocfs2: fix leaks on failure exits in module_init
    ecryptfs: make register_filesystem() the last potential failure exit
    ntfs: forgets to unregister sysctls on register_filesystem() failure
    logfs: missing cleanup on register_filesystem() failure
    jfs: mising cleanup on register_filesystem() failure
    make configfs_pin_fs() return root dentry on success
    configfs: configfs_create_dir() has parent dentry in dentry->d_parent
    configfs: sanitize configfs_create()
    ...

    Linus Torvalds
     

21 Mar, 2012

1 commit


22 Feb, 2012

1 commit

  • If more than one __print_*() function is used in a tracepoint
    (__print_flags(), __print_symbols(), etc), then the temp seq buffer will
    not be zero on entry. Using the temp seq buffer's length to know if
    data has been printed or not in the current function is incorrect and
    may produce incorrect results.

    Currently, no in-tree tracepoint causes this bug, but new ones may
    be created.

    Cc: Andrew Vagin
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

21 Feb, 2012

1 commit

  • If __print_flags() is used after another __print_*() function, the
    temp seq_file buffer will not be empty on entry, and the delimiter will
    be printed even though there's just one field. We get something like:

    |S

    instead of just:

    S

    This is because the length of the temp seq buffer is used to determine
    if the delimiter is printed or not. But this algorithm fails when
    the seq buffer is not empty on entry, and the delimiter will be printed
    because it thinks that a previous field was already printed.

    Link: http://lkml.kernel.org/r/1329650167-480655-1-git-send-email-avagin@openvz.org

    Signed-off-by: Andrew Vagin
    Cc: Ingo Molnar
    Cc: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Andrey Vagin
     

17 Nov, 2011

1 commit

  • People keep asking how to get the preempt count, irq, and need resched info
    and we keep telling them to enable the latency format. Some developers think
    that traces without this info is completely useless, and for a lot of tasks
    it is useless.

    The first option was to enable the latency trace as the default format, but
    the header for the latency format is pretty useless for most tracers and
    it also does the timestamp in straight microseconds from the time the trace
    started. This is sometimes more difficult to read as the default trace is
    seconds from the start of boot up.

    Latency format:

    # tracer: nop
    #
    # nop latency trace v1.1.5 on 3.2.0-rc1-test+
    # --------------------------------------------------------------------
    # latency: 0 us, #159771/64234230, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
    # -----------------
    # | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
    # -----------------
    #
    # _------=> CPU#
    # / _-----=> irqs-off
    # | / _----=> need-resched
    # || / _---=> hardirq/softirq
    # ||| / _--=> preempt-depth
    # |||| / delay
    # cmd pid ||||| time | caller
    # \ / ||||| \ | /
    migratio-6 0...2 41778231us+: rcu_note_context_switch irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| / delay
    # TASK-PID CPU# |||| TIMESTAMP FUNCTION
    # | | | |||| | |
    -0 [000] d..2 49.309305: cpuidle_get_driver -0 [000] d..2 49.309307: mwait_idle -0 [000] d..2 49.309309: need_resched -0 [000] d..2 49.309310: test_ti_thread_flag -0 [000] d..2 49.309312: trace_power_start.constprop.13 -0 [000] d..2 49.309313: trace_cpu_idle -0 [000] d..2 49.309315: need_resched -0 [000] 49.309305: cpuidle_get_driver -0 [000] 49.309307: mwait_idle -0 [000] 49.309309: need_resched -0 [000] 49.309310: test_ti_thread_flag -0 [000] 49.309312: trace_power_start.constprop.13 -0 [000] 49.309313: trace_cpu_idle -0 [000] 49.309315: need_resched
    Signed-off-by: Steven Rostedt

    Steven Rostedt