09 Jan, 2012

1 commit

  • * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
    reiserfs: Properly display mount options in /proc/mounts
    vfs: prevent remount read-only if pending removes
    vfs: count unlinked inodes
    vfs: protect remounting superblock read-only
    vfs: keep list of mounts for each superblock
    vfs: switch ->show_options() to struct dentry *
    vfs: switch ->show_path() to struct dentry *
    vfs: switch ->show_devname() to struct dentry *
    vfs: switch ->show_stats to struct dentry *
    switch security_path_chmod() to struct path *
    vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
    vfs: trim includes a bit
    switch mnt_namespace ->root to struct mount
    vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
    vfs: opencode mntget() mnt_set_mountpoint()
    vfs: spread struct mount - remaining argument of next_mnt()
    vfs: move fsnotify junk to struct mount
    vfs: move mnt_devname
    vfs: move mnt_list to struct mount
    vfs: switch pnode.h macros to struct mount *
    ...

    Linus Torvalds
     

07 Jan, 2012

1 commit

  • * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (106 commits)
    perf kvm: Fix copy & paste error in description
    perf script: Kill script_spec__delete
    perf top: Fix a memory leak
    perf stat: Introduce get_ratio_color() helper
    perf session: Remove impossible condition check
    perf tools: Fix feature-bits rework fallout, remove unused variable
    perf script: Add generic perl handler to process events
    perf tools: Use for_each_set_bit() to iterate over feature flags
    perf tools: Unify handling of features when writing feature section
    perf report: Accept fifos as input file
    perf tools: Moving code in some files
    perf tools: Fix out-of-bound access to struct perf_session
    perf tools: Continue processing header on unknown features
    perf tools: Improve macros for struct feature_ops
    perf: builtin-record: Document and check that mmap_pages must be a power of two.
    perf: builtin-record: Provide advice if mmap'ing fails with EPERM.
    perf tools: Fix truncated annotation
    perf script: look up thread using tid instead of pid
    perf tools: Look up thread names for system wide profiling
    perf tools: Fix comm for processes with named threads
    ...

    Linus Torvalds
     

04 Jan, 2012

1 commit


12 Dec, 2011

1 commit


18 Nov, 2011

1 commit

  • Knowing the number of event entries in the ring buffer compared
    to the total number that were written is useful information. The
    latency format gives this information and there's no reason that the
    default format does not.

    This information is now added to the default header, along with the
    number of online CPUs:

    # tracer: nop
    #
    # entries-in-buffer/entries-written: 159836/64690869 #P:4
    #
    # _-----=> irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| / delay
    # TASK-PID CPU# |||| TIMESTAMP FUNCTION
    # | | | |||| | |
    -0 [000] ...2 49.442971: local_touch_nmi -0 [000] d..2 49.442973: enter_idle -0 [000] d..2 49.442974: atomic_notifier_call_chain -0 [000] d..2 49.442976: __atomic_notifier_call_chain

    Steven Rostedt
     

17 Nov, 2011

1 commit

  • People keep asking how to get the preempt count, irq, and need resched info
    and we keep telling them to enable the latency format. Some developers think
    that traces without this info is completely useless, and for a lot of tasks
    it is useless.

    The first option was to enable the latency trace as the default format, but
    the header for the latency format is pretty useless for most tracers and
    it also does the timestamp in straight microseconds from the time the trace
    started. This is sometimes more difficult to read as the default trace is
    seconds from the start of boot up.

    Latency format:

    # tracer: nop
    #
    # nop latency trace v1.1.5 on 3.2.0-rc1-test+
    # --------------------------------------------------------------------
    # latency: 0 us, #159771/64234230, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
    # -----------------
    # | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
    # -----------------
    #
    # _------=> CPU#
    # / _-----=> irqs-off
    # | / _----=> need-resched
    # || / _---=> hardirq/softirq
    # ||| / _--=> preempt-depth
    # |||| / delay
    # cmd pid ||||| time | caller
    # \ / ||||| \ | /
    migratio-6 0...2 41778231us+: rcu_note_context_switch irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| / delay
    # TASK-PID CPU# |||| TIMESTAMP FUNCTION
    # | | | |||| | |
    -0 [000] d..2 49.309305: cpuidle_get_driver -0 [000] d..2 49.309307: mwait_idle -0 [000] d..2 49.309309: need_resched -0 [000] d..2 49.309310: test_ti_thread_flag -0 [000] d..2 49.309312: trace_power_start.constprop.13 -0 [000] d..2 49.309313: trace_cpu_idle -0 [000] d..2 49.309315: need_resched -0 [000] 49.309305: cpuidle_get_driver -0 [000] 49.309307: mwait_idle -0 [000] 49.309309: need_resched -0 [000] 49.309310: test_ti_thread_flag -0 [000] 49.309312: trace_power_start.constprop.13 -0 [000] 49.309313: trace_cpu_idle -0 [000] 49.309315: need_resched
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

11 Nov, 2011

1 commit


08 Nov, 2011

1 commit

  • In case the the graph tracer (CONFIG_FUNCTION_GRAPH_TRACER) or even the
    function tracer (CONFIG_FUNCTION_TRACER) are not set, the latency tracers
    do not display proper latency header.

    The involved/fixed latency tracers are:
    wakeup_rt
    wakeup
    preemptirqsoff
    preemptoff
    irqsoff

    The patch adds proper handling of tracer configuration options for latency
    tracers, and displaying correct header info accordingly.

    * The current output (for wakeup tracer) with both graph and function
    tracers disabled is:

    # tracer: wakeup
    #
    -0 0d.h5 1us+: 0:120:R + [000] 7: 0:R watchdog/0
    -0 0d.h5 3us+: ttwu_do_activate.clone.1 CPU#
    # / _-----=> irqs-off
    # | / _----=> need-resched
    # || / _---=> hardirq/softirq
    # ||| / _--=> preempt-depth
    # |||| / delay
    # cmd pid ||||| time | caller
    # \ / ||||| \ | /
    cat-1129 0d..4 1us : 1129:120:R + [000] 6: 0:R migration/0
    cat-1129 0d..4 2us+: ttwu_do_activate.clone.1 CPU#
    # / _-----=> irqs-off
    # | / _----=> need-resched
    # || / _---=> hardirq/softirq
    # ||| / _--=> preempt-depth
    # |||| / delay
    # cmd pid ||||| time | caller
    # \ / ||||| \ | /
    -0 1d.h5 1us+: 0:120:R + [001] 12: 0:R watchdog/1
    -0 1d.h5 3us : ttwu_do_activate.clone.1
    Cc: Ingo Molnar
    Signed-off-by: Jiri Olsa
    Signed-off-by: Steven Rostedt

    Jiri Olsa
     

26 Oct, 2011

1 commit

  • * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (121 commits)
    perf symbols: Increase symbol KSYM_NAME_LEN size
    perf hists browser: Refuse 'a' hotkey on non symbolic views
    perf ui browser: Use libslang to read keys
    perf tools: Fix tracing info recording
    perf hists browser: Elide DSO column when it is set to just one DSO, ditto for threads
    perf hists: Don't consider filtered entries when calculating column widths
    perf hists: Don't decay total_period for filtered entries
    perf hists browser: Honour symbol_conf.show_{nr_samples,total_period}
    perf hists browser: Do not exit on tab key with single event
    perf annotate browser: Don't change selection line when returning from callq
    perf tools: handle endianness of feature bitmap
    perf tools: Add prelink suggestion to dso update message
    perf script: Fix unknown feature comment
    perf hists browser: Apply the dso and thread filters when merging new batches
    perf hists: Move the dso and thread filters from hist_browser
    perf ui browser: Honour the xterm colors
    perf top tui: Give color hints just on the percentage, like on --stdio
    perf ui browser: Make the colors configurable and change the defaults
    perf tui: Remove unneeded call to newtCls on startup
    perf hists: Don't format the percentage on hist_entry__snprintf
    ...

    Fix up conflicts in arch/x86/kernel/kprobes.c manually.

    Ingo's tree did the insane "add volatile to const array", which just
    doesn't make sense ("volatile const"?). But we could remove the const
    *and* make the array volatile to make doubly sure that gcc doesn't
    optimize it away..

    Also fix up kernel/trace/ring_buffer.c non-data-conflicts manually: the
    reader_lock has been turned into a raw lock by the core locking merge,
    and there was a new user of it introduced in this perf core merge. Make
    sure that new use also uses the raw accessor functions.

    Linus Torvalds
     

14 Oct, 2011

2 commits

  • The trace_pipe_raw handler holds a cached page from the time the file
    is opened to the time it is closed. The cached page is used to handle
    the case of the user space buffer being smaller than what was read from
    the ring buffer. The left over buffer is held in the cache so that the
    next read will continue where the data left off.

    After EOF is returned (no more data in the buffer), the index of
    the cached page is set to zero. If a user app reads the page again
    after EOF, the check in the buffer will see that the cached page
    is less than page size and will return the cached page again. This
    will cause reading the trace_pipe_raw again after EOF to return
    duplicate data, making the output look like the time went backwards
    but instead data is just repeated.

    The fix is to not reset the index right after all data is read
    from the cache, but to reset it after all data is read and more
    data exists in the ring buffer.

    Cc: stable
    Reported-by: Jeremy Eder
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • tracing_enabled option is deprecated.
    To start/stop tracing, write to /sys/kernel/debug/tracing/tracing_on
    without tracing_enabled. This patch is based on Linux 3.1.0-rc1

    Signed-off-by: Geunsik Lim
    Link: http://lkml.kernel.org/r/1313127022-23830-1-git-send-email-leemgs1@gmail.com
    Signed-off-by: Steven Rostedt

    Geunsik Lim
     

11 Oct, 2011

2 commits

  • When doing intense tracing, the kmalloc inside trace_marker can
    introduce side effects to what is being traced.

    As trace_marker() is used by userspace to inject data into the
    kernel ring buffer, it needs to do so with the least amount
    of intrusion to the operations of the kernel or the user space
    application.

    As the ring buffer is designed to write directly into the buffer
    without the need to make a temporary buffer, and userspace already
    went through the hassle of knowing how big the write will be,
    we can simply pin the userspace pages and write the data directly
    into the buffer. This improves the impact of tracing via trace_marker
    tremendously!

    Thanks to Peter Zijlstra and Thomas Gleixner for pointing out the
    use of get_user_pages_fast() and kmap_atomic().

    Suggested-by: Thomas Gleixner
    Suggested-by: Peter Zijlstra
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • As the function tracer is very intrusive, lots of self checks are
    performed on the tracer and if something is found to be strange
    it will shut itself down keeping it from corrupting the rest of the
    kernel. This shutdown may still allow functions to be traced, as the
    tracing only stops new modifications from happening. Trying to stop
    the function tracer itself can cause more harm as it requires code
    modification.

    Although a WARN_ON() is executed, a user may not notice it. To help
    the user see that something isn't right with the tracing of the system
    a big warning is added to the output of the tracer that lets the user
    know that their data may be incomplete.

    Reported-by: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

19 Sep, 2011

1 commit

  • When debugging tight race conditions, it can be helpful to have a
    synchronized tracing method. Although in most cases the global clock
    provides this functionality, if timings is not the issue, it is more
    comforting to know that the order of events really happened in a precise
    order.

    Instead of using a clock, add a "counter" that is simply an incrementing
    atomic 64bit counter that orders the events as they are perceived to
    happen.

    The trace_clock_counter() is added from the attempt by Peter Zijlstra
    trying to convert the trace_clock_global() to it. I took Peter's counter
    code and made trace_clock_counter() instead, and added it to the choice
    of clocks. Just echo counter > /debug/tracing/trace_clock to activate
    it.

    Requested-by: Thomas Gleixner
    Requested-by: Peter Zijlstra
    Reviewed-By: Valdis Kletnieks
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

13 Sep, 2011

1 commit

  • The tracing locks can be taken in atomic context and therefore
    cannot be preempted on -rt - annotate it.

    In mainline this change documents the low level nature of
    the lock - otherwise there's no functional difference. Lockdep
    and Sparse checking will work as usual.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

31 Aug, 2011

2 commits

  • The stats file under per_cpu folder provides the number of entries,
    overruns and other statistics about the CPU ring buffer. However, the
    numbers do not provide any indication of how full the ring buffer is in
    bytes compared to the overall size in bytes. Also, it is helpful to know
    the rate at which the cpu buffer is filling up.

    This patch adds an entry "bytes: " in printed stats for per_cpu ring
    buffer which provides the actual bytes consumed in the ring buffer. This
    field includes the number of bytes used by recorded events and the
    padding bytes added when moving the tail pointer to next page.

    It also adds the following time stamps:
    "oldest event ts:" - the oldest timestamp in the ring buffer
    "now ts:" - the timestamp at the time of reading

    The field "now ts" provides a consistent time snapshot to the userspace
    when being read. This is read from the same trace clock used by tracing
    event timestamps.

    Together, these values provide the rate at which the buffer is filling
    up, from the formula:
    bytes / (now_ts - oldest_event_ts)

    Signed-off-by: Vaibhav Nagarnaik
    Cc: Michael Rubin
    Cc: David Sharp
    Link: http://lkml.kernel.org/r/1313531179-9323-3-git-send-email-vnagarnaik@google.com
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     
  • The current file "buffer_size_kb" reports the size of per-cpu buffer and
    not the overall memory allocated which could be misleading. A new file
    "buffer_total_size_kb" adds up all the enabled CPU buffer sizes and
    reports it. This is only a readonly entry.

    Signed-off-by: Vaibhav Nagarnaik
    Cc: Michael Rubin
    Cc: David Sharp
    Link: http://lkml.kernel.org/r/1313531179-9323-2-git-send-email-vnagarnaik@google.com
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     

15 Jul, 2011

1 commit

  • Currently the stack trace per event in ftace is only 8 frames.
    This can be quite limiting and sometimes useless. Especially when
    the "ignore frames" is wrong and we also use up stack frames for
    the event processing itself.

    Change this to be dynamic by adding a percpu buffer that we can
    write a large stack frame into and then copy into the ring buffer.

    For interrupts and NMIs that come in while another event is being
    process, will only get to use the 8 frame stack. That should be enough
    as the task that it interrupted will have the full stack frame anyway.

    Requested-by: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

15 Jun, 2011

7 commits

  • Fix to support kernel stack trace correctly on kprobe-tracer.
    Since the execution path of kprobe-based dynamic events is different
    from other tracepoint-based events, normal ftrace_trace_stack() doesn't
    work correctly. To fix that, this introduces ftrace_trace_stack_regs()
    which traces stack via pt_regs instead of current stack register.

    e.g.

    # echo p schedule+4 > /sys/kernel/debug/tracing/kprobe_events
    # echo 1 > /sys/kernel/debug/tracing/options/stacktrace
    # echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
    # head -n 20 /sys/kernel/debug/tracing/trace
    bash-2968 [000] 10297.050245: p_schedule_4: (schedule+0x4/0x4ca)
    bash-2968 [000] 10297.050247:
    => schedule_timeout
    => n_tty_read
    => tty_read
    => vfs_read
    => sys_read
    => system_call_fastpath
    kworker/0:1-2940 [000] 10297.050265: p_schedule_4: (schedule+0x4/0x4ca)
    kworker/0:1-2940 [000] 10297.050266:
    => worker_thread
    => kthread
    => kernel_thread_helper
    sshd-1132 [000] 10297.050365: p_schedule_4: (schedule+0x4/0x4ca)
    sshd-1132 [000] 10297.050365:
    => sysret_careful

    Note: Even with this fix, the first entry will be skipped
    if the probe is put on the function entry area before
    the frame pointer is set up (usually, that is 4 bytes
    (push %bp; mov %sp %bp) on x86), because stack unwinder
    depends on the frame pointer.

    Signed-off-by: Masami Hiramatsu
    Cc: Frederic Weisbecker
    Cc: yrl.pp-manager.tt@hitachi.com
    Cc: Peter Zijlstra
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/20110608070934.17777.17116.stgit@fedora15
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • This patch replaces the code for getting an unsigned long from a
    userspace buffer by a simple call to kstroul_from_user.
    This makes it easier to read and less error prone.

    Signed-off-by: Peter Huewe
    Link: http://lkml.kernel.org/r/1307476707-14762-1-git-send-email-peterhuewe@gmx.de
    Signed-off-by: Steven Rostedt

    Peter Huewe
     
  • The header display of function tracer does not follow
    the context-info option, so field names are displayed even
    if this option is off.

    Added check for TRACE_ITER_CONTEXT_INFO trace_flags.

    With following commands:
    # echo function > ./current_tracer
    # echo 0 > options/context-info
    # cat trace

    This is what it looked like before:
    # tracer: function
    #
    # TASK-PID CPU# TIMESTAMP FUNCTION
    # | | | | |
    add_preempt_count
    Link: http://lkml.kernel.org/r/1307113131-10045-4-git-send-email-jolsa@redhat.com
    Signed-off-by: Steven Rostedt

    Jiri Olsa
     
  • Add a trace option to disable tracing on free. When this option is
    set, a write into the free_buffer file will not only shrink the
    ring buffer down to zero, but it will also disable tracing.

    Cc: Vaibhav Nagarnaik
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The proc file entry buffer_size_kb is used to set the size of tracing
    buffer. The memory to expand the buffer size is kernel memory. Consider
    a use case where tracing is handled by a user space utility, which acts
    as a gate keeper for tracing requests. In an OOM condition, tracing is
    considered a low priority task and if the utility gets killed the ring
    buffer memory cannot be released back to the kernel.

    This patch adds a proc file called "free_buffer" whose purpose is to
    stop tracing and free up the ring buffer when it is closed.

    The user space process can then set the desired size in buffer_size_kb
    file and open the fd to the "free_buffer" file. Under OOM condition, if
    the process gets killed, the kernel closes the file descriptor. The
    release handler stops the tracing and releases the kernel memory
    automatically.

    Cc: Ingo Molnar
    Cc: Frederic Weisbecker
    Cc: Michael Rubin
    Cc: David Sharp
    Signed-off-by: Vaibhav Nagarnaik
    Link: http://lkml.kernel.org/r/1308012717-11148-1-git-send-email-vnagarnaik@google.com
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     
  • The tracing ring buffer is a group of per-cpu ring buffers where
    allocation and logging is done on a per-cpu basis. The events that are
    generated on a particular CPU are logged in the corresponding buffer.
    This is to provide wait-free writes between CPUs and good NUMA node
    locality while accessing the ring buffer.

    However, the allocation routines consider NUMA locality only for buffer
    page metadata and not for the actual buffer page. This causes the pages
    to be allocated on the NUMA node local to the CPU where the allocation
    routine is running at the time.

    This patch fixes the problem by using a NUMA node specific allocation
    routine so that the pages are allocated from a NUMA node local to the
    logging CPU.

    I tested with the getuid_microbench from autotest. It is a simple binary
    that calls getuid() in a loop and measures the average time for the
    syscall to complete. The following command was used to test:
    $ getuid_microbench 1000000

    Compared the numbers found on kernel with and without this patch and
    found that logging latency decreases by 30-50 ns/call.
    tracing with non-NUMA allocation - 569 ns/call
    tracing with NUMA allocation - 512 ns/call

    Signed-off-by: Vaibhav Nagarnaik
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Michael Rubin
    Cc: David Sharp
    Link: http://lkml.kernel.org/r/1304470602-20366-1-git-send-email-vnagarnaik@google.com
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     
  • In using syscall tracing by concurrent processes, the wakeup() that is
    called in the event commit function causes contention on the spin lock
    of the waitqueue. I enabled sys_enter_getuid and sys_exit_getuid
    tracepoints, and by running getuid_microbench from autotest in parallel
    I found that the contention causes exponential latency increase in the
    tracing path.

    The autotest binary getuid_microbench calls getuid() in a tight loop for
    the given number of iterations and measures the average time required to
    complete a single invocation of syscall.

    The patch schedules a delayed work after 2 ms once an event commit calls
    to wake up the trace wait_queue. This removes the delay caused by
    contention on spin lock in wakeup() and amortizes the wakeup() calls
    scheduled over the 2 ms period.

    In the following example, the script enables the sys_enter_getuid and
    sys_exit_getuid tracepoints and runs the getuid_microbench in parallel
    with the given number of processes. The output clearly shows the latency
    increase caused by contentions.

    $ ~/getuid.sh 1
    1000000 calls in 0.720974253 s (720.974253 ns/call)

    $ ~/getuid.sh 2
    1000000 calls in 1.166457554 s (1166.457554 ns/call)
    1000000 calls in 1.168933765 s (1168.933765 ns/call)

    $ ~/getuid.sh 3
    1000000 calls in 1.783827516 s (1783.827516 ns/call)
    1000000 calls in 1.795553270 s (1795.553270 ns/call)
    1000000 calls in 1.796493376 s (1796.493376 ns/call)

    $ ~/getuid.sh 4
    1000000 calls in 4.483041796 s (4483.041796 ns/call)
    1000000 calls in 4.484165388 s (4484.165388 ns/call)
    1000000 calls in 4.484850762 s (4484.850762 ns/call)
    1000000 calls in 4.485643576 s (4485.643576 ns/call)

    $ ~/getuid.sh 5
    1000000 calls in 6.497521653 s (6497.521653 ns/call)
    1000000 calls in 6.502000236 s (6502.000236 ns/call)
    1000000 calls in 6.501709115 s (6501.709115 ns/call)
    1000000 calls in 6.502124100 s (6502.124100 ns/call)
    1000000 calls in 6.502936358 s (6502.936358 ns/call)

    After the patch, the latencies scale better.
    1000000 calls in 0.728720455 s (728.720455 ns/call)

    1000000 calls in 0.842782857 s (842.782857 ns/call)
    1000000 calls in 0.883803135 s (883.803135 ns/call)

    1000000 calls in 0.902077764 s (902.077764 ns/call)
    1000000 calls in 0.902838202 s (902.838202 ns/call)
    1000000 calls in 0.908896885 s (908.896885 ns/call)

    1000000 calls in 0.932523515 s (932.523515 ns/call)
    1000000 calls in 0.958009672 s (958.009672 ns/call)
    1000000 calls in 0.986188020 s (986.188020 ns/call)
    1000000 calls in 0.989771102 s (989.771102 ns/call)

    1000000 calls in 0.933518391 s (933.518391 ns/call)
    1000000 calls in 0.958897947 s (958.897947 ns/call)
    1000000 calls in 1.031038897 s (1031.038897 ns/call)
    1000000 calls in 1.089516025 s (1089.516025 ns/call)
    1000000 calls in 1.141998347 s (1141.998347 ns/call)

    Signed-off-by: Vaibhav Nagarnaik
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Michael Rubin
    Cc: David Sharp
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1305059241-7629-1-git-send-email-vnagarnaik@google.com
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     

10 May, 2011

1 commit


07 May, 2011

1 commit

  • This partially reverts commit e6e1e2593592a8f6f6380496655d8c6f67431266.

    That commit changed the structure layout of the trace structure, which
    in turn broke PowerTOP (1.9x generation) quite badly.

    I appreciate not wanting to expose the variable in question, and
    PowerTOP was not using it, so I've replaced the variable with just a
    padding field - that way if in the future a new field is needed it can
    just use this padding field.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

27 Apr, 2011

1 commit


05 Apr, 2011

1 commit

  • running following commands:

    # enable the binary option
    echo 1 > ./options/bin
    # disable context info option
    echo 0 > ./options/context-info
    # tracing only events
    echo 1 > ./events/enable
    cat trace_pipe

    plus forcing system to generate many tracing events,
    is causing lockup (in NON preemptive kernels) inside
    tracing_read_pipe function.

    The issue is also easily reproduced by running ltp stress test.
    (ftrace_stress_test.sh)

    The reasons are:
    - bin/hex/raw output functions for events are set to
    trace_nop_print function, which prints nothing and
    returns TRACE_TYPE_HANDLED value
    - LOST EVENT trace do not handle trace_seq overflow

    These reasons force the while loop in tracing_read_pipe
    function never to break.

    The attached patch fixies handling of lost event trace, and
    changes trace_nop_print to print minimal info, which is needed
    for the correct tracing_read_pipe processing.

    v2 changes:
    - omit the cond_resched changes by trace_nop_print changes
    - WARN changed to WARN_ONCE and added info to be able
    to find out the culprit

    v3 changes:
    - make more accurate patch comment

    Signed-off-by: Jiri Olsa
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Jiri Olsa
     

31 Mar, 2011

1 commit


10 Mar, 2011

3 commits

  • If the kernel command line declares a tracer "ftrace=sometracer" and
    that tracer is either not defined or is enabled after irqsoff,
    then the irqs off selftest will fail with the following error:

    Testing tracer irqsoff:
    ------------[ cut here ]------------
    WARNING: at /home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/tra
    ce.c:713 update_max_tr_single+0xfa/0x11b()
    Hardware name:
    Modules linked in:
    Pid: 1, comm: swapper Not tainted 2.6.38-rc8-test #1
    Call Trace:
    [] ? warn_slowpath_common+0x65/0x7a
    [] ? update_max_tr_single+0xfa/0x11b
    [] ? warn_slowpath_null+0xf/0x13
    [] ? update_max_tr_single+0xfa/0x11b
    [] ? stop_critical_timing+0x154/0x204
    [] ? trace_selftest_startup_irqsoff+0x5b/0xc1
    [] ? trace_selftest_startup_irqsoff+0x5b/0xc1
    [] ? trace_selftest_startup_irqsoff+0x5b/0xc1
    [] ? time_hardirqs_on+0x25/0x28
    [] ? trace_hardirqs_on_caller+0x18/0x12f
    [] ? trace_hardirqs_on+0xb/0xd
    [] ? trace_selftest_startup_irqsoff+0x5b/0xc1
    [] ? register_tracer+0xf8/0x1a3
    [] ? init_irqsoff_tracer+0xd/0x11
    [] ? do_one_initcall+0x71/0x121
    [] ? init_irqsoff_tracer+0x0/0x11
    [] ? kernel_init+0x13a/0x1b6
    [] ? kernel_init+0x0/0x1b6
    [] ? kernel_thread_helper+0x6/0x10
    ---[ end trace e93713a9d40cd06c ]---
    .. no entries found ..FAILED!

    What happens is the "ftrace=..." will expand the ring buffer to its
    default size (from its minimum size) but it will not expand the
    max ring buffer (the ring buffer to store maximum latencies).
    When the irqsoff test runs, it will call the ring buffer swap routine
    that checks if the max ring buffer is the same size as the normal
    ring buffer, and will fail if it is not. This causes the test to fail.

    The solution is to expand the max ring buffer before running the self
    test if the max ring buffer is used by that tracer and the normal ring
    buffer is expanded. The max ring buffer should be shrunk again after
    the test is done to save space.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The lock_depth field in the event headers was added as a temporary
    data point for help in removing the BKL. Now that the BKL is pretty
    much been removed, we can remove this field.

    This in turn changes the header from 12 bytes to 8 bytes,
    removing the 4 byte buffer that gcc would insert if the first field
    in the data load was 8 bytes in size.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Add an "overwrite" trace_option for ftrace to control whether the buffer should
    be overwritten on overflow or not. The default remains to overwrite old events
    when the buffer is full. This patch adds the option to instead discard newest
    events when the buffer is full. This is useful to get a snapshot of traces just
    after enabling traces. Dropping the current event is also a simpler code path.

    Signed-off-by: David Sharp
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    David Sharp
     

09 Feb, 2011

1 commit


08 Jan, 2011

1 commit

  • While running my ftrace stress test, this showed up:

    BUG: sleeping function called from invalid context at mm/mmap.c:233
    ...
    note: cat[3293] exited with preempt_count 1

    The bug was introduced by commit 91e86e560d0b3ce4c5fc64fd2bbb99f856a30a4e
    ("tracing: Fix recursive user stack trace")

    Cc:
    Signed-off-by: Li Zefan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     

16 Dec, 2010

1 commit


01 Dec, 2010

1 commit

  • The file_ops struct for the "trace" special file defined llseek as seq_lseek().
    However, if the file was opened for writing only, seq_open() was not called,
    and the seek would dereference a null pointer, file->private_data.

    This patch introduces a new wrapper for seq_lseek() which checks if the file
    descriptor is opened for reading first. If not, it does nothing.

    Cc:
    Signed-off-by: Slava Pestov
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Slava Pestov
     

27 Nov, 2010

1 commit

  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    perf symbols: Remove incorrect open-coded container_of()
    perf record: Handle restrictive permissions in /proc/{kallsyms,modules}
    x86/kprobes: Prevent kprobes to probe on save_args()
    irq_work: Drop cmpxchg() result
    perf: Fix owner-list vs exit
    x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOG
    tracing: Fix recursive user stack trace
    perf,hw_breakpoint: Initialize hardware api earlier
    x86: Ignore trap bits on single step exceptions
    tracing: Force arch_local_irq_* notrace for paravirt
    tracing: Fix module use of trace_bprintk()

    Linus Torvalds
     

18 Nov, 2010

1 commit


13 Nov, 2010

1 commit

  • The user stack trace can fault when examining the trace. Which
    would call the do_page_fault handler, which would trace again,
    which would do the user stack trace, which would fault and call
    do_page_fault again ...

    Thus this is causing a recursive bug. We need to have a recursion
    detector here.

    [ Resubmitted by Jiri Olsa ]

    [ Eric Dumazet recommended using __this_cpu_* instead of __get_cpu_* ]

    Cc: Eric Dumazet
    Signed-off-by: Jiri Olsa
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Steven Rostedt