10 Mar, 2009

1 commit


09 Mar, 2009

1 commit

  • Impact: improve workqueue tracer output

    Currently, /sys/kernel/debug/tracing/trace_stat/workqueues can display
    wrong and strange thread names.

    Why?

    Currently, ftrace has tracing_record_cmdline()/trace_find_cmdline()
    convenience function that implements a task->comm string cache.

    This can avoid unnecessary memcpy overhead and the workqueue tracer
    uses it.

    However, in general, any trace statistics feature shouldn't use
    tracing_record_cmdline() because trace statistics can display
    very old process. Then comm cache can return wrong string because
    recent process overrides the cache.

    Fortunately, workqueue trace guarantees that displayed processes
    are live. Thus we can search comm string from PID at display time.

    % cat workqueues
    # CPU INSERTED EXECUTED NAME
    # | | | |

    7 431913 431913 kondemand/7
    7 0 0 tail
    7 21 21 git
    7 0 0 ls
    7 9 9 cat
    7 832632 832632 unix_chkpwd
    7 236292 236292 ls

    Note: tail, git, ls, cat unix_chkpwd are obiously not workqueue thread.

    % cat workqueues
    # CPU INSERTED EXECUTED NAME
    # | | | |

    7 510 510 kondemand/7
    7 0 0 kmpathd/7
    7 15 15 ata/7
    7 0 0 aio/7
    7 11 11 kblockd/7
    7 1063 1063 work_on_cpu/7
    7 167 167 events/7

    Signed-off-by: KOSAKI Motohiro
    Cc: Lai Jiangshan
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Signed-off-by: Ingo Molnar

    KOSAKI Motohiro
     

08 Mar, 2009

2 commits

  • In a recent discussion, Andrew Morton pointed out that tracing header
    should put '#' at the beginning of a line.

    Then, we can easily filtered the header by following grep usage:

    cat trace | grep -v '^#'

    Wakeup trace also has the same header problem.

    Comparison of headers displayed:

    before this patch:

    # tracer: wakeup
    #
    wakeup latency trace v1.1.5 on 2.6.29-rc7-tip-tip
    --------------------------------------------------------------------
    latency: 19059 us, #21277/21277, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
    -----------------
    | task: kondemand/1-1644 (uid:0 nice:-5 policy:0 rt_prio:0)
    -----------------

    # _------=> CPU#
    # / _-----=> irqs-off
    # | / _----=> need-resched
    # || / _---=> hardirq/softirq
    # ||| / _--=> preempt-depth
    # |||| /
    # ||||| delay
    # cmd pid ||||| time | caller
    # \ / ||||| \ | /
    irqbalan-1887 1d.s. 0us : 1887:120:R + [001] 1644:115:S kondemand/1
    irqbalan-1887 1d.s. 1us : default_wake_function CPU#
    # / _-----=> irqs-off
    # | / _----=> need-resched
    # || / _---=> hardirq/softirq
    # ||| / _--=> preempt-depth
    # |||| /
    # ||||| delay
    # cmd pid ||||| time | caller
    # \ / ||||| \ | /
    sshd-2496 0d.s. 0us : 2496:120:R + [000] 1641:115:S kondemand/0
    sshd-2496 0d.s. 1us : default_wake_function
    Cc: Andrew Morton
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    KOSAKI Motohiro
     
  • …xtedit' into tracing/core

    Ingo Molnar
     

07 Mar, 2009

4 commits

  • Impact: cleanup

    Remove a few leftovers and clean up the code a bit.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Impact: faster and lighter tracing

    Now that we have trace_bprintk() which is faster and consume lesser
    memory than trace_printk() and has the same purpose, we can now drop
    the old implementation in favour of the binary one from trace_bprintk(),
    which means we move all the implementation of trace_bprintk() to
    trace_printk(), so the Api doesn't change except that we must now use
    trace_seq_bprintk() to print the TRACE_PRINT entries.

    Some changes result of this:

    - Previously, trace_bprintk depended of a single tracer and couldn't
    work without. This tracer has been dropped and the whole implementation
    of trace_printk() (like the module formats management) is now integrated
    in the tracing core (comes with CONFIG_TRACING), though we keep the file
    trace_printk (previously trace_bprintk.c) where we can find the module
    management. Thus we don't overflow trace.c

    - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries.

    - change a bit trace_printk/trace_vprintk macros to support non-builtin formats
    constants, and fix 'const' qualifiers warnings. But this is all transparent for
    developers.

    - etc...

    V2:

    - Rebase against last changes
    - Fix mispell on the changelog

    V3:

    - Rebase against last changes (moving trace_printk() to kernel.h)

    Signed-off-by: Frederic Weisbecker
    Acked-by: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Impact: add a generic printk() for tracing, like trace_printk()

    trace_bprintk() uses the infrastructure to record events on ring_buffer.

    [ fweisbec@gmail.com: ported to latest -tip, made it work if
    !CONFIG_MODULES, never free the format strings from modules
    because we can't keep track of them and conditionnaly create
    the ftrace format strings section (reported by Steven Rostedt) ]

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Frederic Weisbecker
    Acked-by: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Lai Jiangshan
     
  • Impact: save on memory for tracing

    Current tracers are typically using a struct(like struct ftrace_entry,
    struct ctx_switch_entry, struct special_entr etc...)to record a binary
    event. These structs can only record a their own kind of events.
    A new kind of tracer need a new struct and a lot of code too handle it.

    So we need a generic binary record for events. This infrastructure
    is for this purpose.

    [fweisbec@gmail.com: rebase against latest -tip, make it safe while sched
    tracing as reported by Steven Rostedt]

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Frederic Weisbecker
    Acked-by: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Lai Jiangshan
     

06 Mar, 2009

11 commits

  • Use the mutual exclusion provided by the text edit lock in the kprobes code. It
    allows coherent manipulation of the kernel code by other subsystems.

    Changelog:

    Move the kernel_text_lock/unlock out of the for loops.
    Use text_mutex directly instead of a function.
    Remove whitespace modifications.

    (note : kprobes_mutex is always taken outside of text_mutex)

    Signed-off-by: Mathieu Desnoyers
    Acked-by: Ananth N Mavinakayanahalli
    Acked-by: Masami Hiramatsu
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mathieu Desnoyers
     
  • Conflicts:
    arch/x86/Kconfig
    block/blktrace.c
    kernel/irq/handle.c

    Semantic conflict:
    kernel/trace/blktrace.c

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Impact: fix deadlock while using set_ftrace_pid

    Reproducer:

    # cd /sys/kernel/debug/tracing
    # echo $$ > set_ftrace_pid

    then, console becomes hung.

    Details:

    when writing set_ftracepid, kernel callstack is following

    ftrace_pid_write()
    mutex_lock(&ftrace_lock);
    ftrace_update_pid_func()
    mutex_lock(&ftrace_lock);
    mutex_unlock(&ftrace_lock);
    mutex_unlock(&ftrace_lock);

    then, system always deadlocks when ftrace_pid_write() is called.

    In past days, ftrace_pid_write() used ftrace_start_lock, but
    commit e6ea44e9b4c12325337cd1c06103cd515a1c02b2 consolidated
    ftrace_start_lock to ftrace_lock.

    Signed-off-by: KOSAKI Motohiro
    Reviewed-by: Lai Jiangshan
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    KOSAKI Motohiro
     
  • After commit 40ada30f9621fbd831ac2437b9a2a399aad34b00,
    "make menuconfig" doesn't display "Tracer" item.

    Following modification restores it.

    KOSAKI Motohiro
     
  • …it/rostedt/linux-2.6-trace into tracing/ftrace

    Ingo Molnar
     
  • Ingo Molnar
     
  • Impact: allow user apps to read binary format of basic ftrace entries

    Currently, only defined raw events export their formats so a binary
    reader can parse them. There's no reason that the default ftrace entries
    can't export their formats.

    This patch adds a subsystem called "ftrace" in the events directory
    that includes the ftrace entries for basic ftrace recorded items.

    These only have three files in the events directory:

    type : printf
    available_types : printf
    format : format for the event entry

    For example:

    # cat /debug/tracing/events/ftrace/wakeup/format
    name: wakeup
    ID: 3
    format:
    field:unsigned char type; offset:0; size:1;
    field:unsigned char flags; offset:1; size:1;
    field:unsigned char preempt_count; offset:2; size:1;
    field:int pid; offset:4; size:4;
    field:int tgid; offset:8; size:4;

    field:unsigned int prev_pid; offset:12; size:4;
    field:unsigned char prev_prio; offset:16; size:1;
    field:unsigned char prev_state; offset:17; size:1;
    field:unsigned int next_pid; offset:20; size:4;
    field:unsigned char next_prio; offset:24; size:1;
    field:unsigned char next_state; offset:25; size:1;
    field:unsigned int next_cpu; offset:28; size:4;

    print fmt: "%u:%u:%u ==+ %u:%u:%u [%03u]"

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Impact: clean up

    Move the macro that creates the event format file to a separate header.
    This will allow the default ftrace events to use this same macro
    to create the formats to read those events.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Impact: cleanup

    All file_operations structures should be constant. No one is going to
    change them.

    Reported-by: Andrew Morton
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Clean up menu structure, introduce TRACING_SUPPORT switch that signals
    whether an architecture supports various instrumentation mechanisms.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Conflicts:
    arch/x86/include/asm/fixmap_64.h
    Semantic merge:
    arch/x86/include/asm/fixmap.h

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

05 Mar, 2009

15 commits

  • Impact: decrease hangs risks with the graph tracer on slow systems

    Since the function graph tracer can spend too much time on timer
    interrupts, it's better now to use the more lightweight local
    clock. Anyway, the function graph traces are more reliable on a
    per cpu trace.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Impact: cleanup

    Use a more generic name - this also allows the prototype to move
    to kernel.h and be generally available to kernel developers who
    want to do some quick tracing.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • The latency tracers (irqsoff, preemptoff, preemptirqsoff, and wakeup)
    are pretty useless with the default output format. This patch makes them
    automatically enable the latency format when they are selected. They
    also record the state of the latency option, and if it was not enabled
    when selected, they disable it on reset.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Impact: clean up

    Both print_lat_fmt and print_trace_fmt do pretty much the same thing
    except for one different function call. This patch consolidates the
    two functions and adds an if statement to perform the difference.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Impact: clean up

    The trace and latency_trace function pointers are identical for
    every tracer but the function tracer. The differences in the function
    tracer are trivial (latency output puts paranthesis around parent).

    This patch removes the latency_trace pointer and all prints will
    now just use the trace output function pointer.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • With the removal of the latency_trace file, we lost the ability
    to see some of the finer details in a trace. Like the state of
    interrupts enabled, the preempt count, need resched, and if we
    are in an interrupt handler, softirq handler or not.

    This patch simply creates an option to bring back the old format.
    This also removes the warning about an unused variable that held
    the latency_trace file operations.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The buffer used by trace_seq was updated incorrectly. Instead
    of consuming what was actually read, it consumed the rest of the
    buffer on reads.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Impact: fix trace read to conform to standards

    Andrew Morton, Theodore Tso and H. Peter Anvin brought to my attention
    that a userspace read should not return -EFAULT if it succeeded in
    copying anything. It should only return -EFAULT if it failed to copy
    at all.

    This patch modifies the check of copy_from_user and updates the return
    code appropriately.

    I also used H. Peter Anvin's short cut rule to just test ret == count.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • If a partial ring_buffer_page_read happens, then some of the
    incremental timestamps may be lost. This patch writes the
    recent timestamp into the page that is passed back to the caller.

    A partial ring_buffer_page_read is where the full page would not
    be written back to the user, and instead, just part of the page
    is copied to the user. A full page would be a page swap with the
    ring buffer and the timestamps would be correct.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Impact: fix to ftrace_dump output corruption

    The commit: b04cc6b1f6398b0e0b60d37e27ce51b4899672ec
    tracing/core: introduce per cpu tracing files

    added a new field to the iterator called cpu_file. This was a handle
    to differentiate between the per cpu trace output files and the
    all cpu "trace" file. The all cpu "trace" file required setting this
    to TRACE_PIPE_ALL_CPU.

    The problem is that the ftrace_dump sets up its own iterator but was
    not updated to handle this change. The result was only CPU 0 printing
    out on crash and a lot of ""'s also being printed.

    Reported-by: Thomas Gleixner
    Tested-by: Darren Hart
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • If a machine is flooded by network frames, a cpu can loop
    100% of its time inside ksoftirqd() without calling schedule().
    This can delay RCU grace period to insane values.

    Adding rcu_qsctr_inc() call in ksoftirqd() solves this problem.

    Paul: "This regression was a result of the recent change from
    "schedule()" to "cond_resched()", which got rid of that quiescent
    state in the common case where a reschedule is not needed".

    Signed-off-by: Eric Dumazet
    Reviewed-by: Paul E. McKenney
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    Eric Dumazet
     
  • Augment the traces with lock names when lockdep is available:

    1) | down_read_trylock() {
    1) | _spin_lock_irqsave() {
    1) | /* lock_acquire: &sem->wait_lock */
    1) 4.201 us | }
    1) | _spin_unlock_irqrestore() {
    1) | /* lock_release: &sem->wait_lock */
    1) 3.523 us | }
    1) | /* lock_acquire: try read &mm->mmap_sem */
    1) + 13.386 us | }
    1) 1.635 us | find_vma();
    1) | handle_mm_fault() {
    1) | __do_fault() {
    1) | filemap_fault() {
    1) | find_lock_page() {
    1) | find_get_page() {
    1) | /* lock_acquire: read rcu_read_lock */
    1) | /* lock_release: rcu_read_lock */
    1) 5.697 us | }
    1) 8.158 us | }
    1) + 11.079 us | }
    1) | _spin_lock() {
    1) | /* lock_acquire: __pte_lockptr(page) */
    1) 3.949 us | }
    1) 1.460 us | page_add_file_rmap();
    1) | _spin_unlock() {
    1) | /* lock_release: __pte_lockptr(page) */
    1) 3.115 us | }
    1) | unlock_page() {
    1) 1.421 us | page_waitqueue();
    1) 1.220 us | __wake_up_bit();
    1) 6.519 us | }
    1) + 34.328 us | }
    1) + 37.452 us | }
    1) | up_read() {
    1) | /* lock_release: &mm->mmap_sem */
    1) | _spin_lock_irqsave() {
    1) | /* lock_acquire: &sem->wait_lock */
    1) 3.865 us | }
    1) | _spin_unlock_irqrestore() {
    1) | /* lock_release: &sem->wait_lock */
    1) 8.562 us | }
    1) + 17.370 us | }

    Signed-off-by: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: =?ISO-8859-1?Q?T=F6r=F6k?= Edwin
    Cc: Jason Baron
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Ingo Molnar
     
  • Impact: clarify lockdep printk text

    print_irq_inversion_bug() gets handed state strings of the form

    "HARDIRQ", "SOFTIRQ", "RECLAIM_FS"

    and appends "-irq-{un,}safe" to them, which is either redudant for *IRQ or
    confusing in the RECLAIM_FS case.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • In the recent mark_lock_irq() rework a bug snuck in that would report the
    state of write locks causing irq inversion under a read lock as a read
    lock.

    Fix this by masking the read bit of the state when validating write
    dependencies.

    Reported-by: Andrew Morton
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

04 Mar, 2009

6 commits