15 Jun, 2009

1 commit


09 Jun, 2009

1 commit

  • On Sun, 7 Jun 2009, Ingo Molnar wrote:
    > Testing tracer sched_switch: Starting ring buffer hammer
    > PASSED
    > Testing tracer sysprof: PASSED
    > Testing tracer function: PASSED
    > Testing tracer irqsoff:
    > =============================================
    > PASSED
    > Testing tracer preemptoff: PASSED
    > Testing tracer preemptirqsoff: [ INFO: possible recursive locking detected ]
    > PASSED
    > Testing tracer branch: 2.6.30-rc8-tip-01972-ge5b9078-dirty #5760
    > ---------------------------------------------
    > rb_consumer/431 is trying to acquire lock:
    > (&cpu_buffer->reader_lock){......}, at: [] ring_buffer_reset_cpu+0x37/0x70
    >
    > but task is already holding lock:
    > (&cpu_buffer->reader_lock){......}, at: [] ring_buffer_consume+0x7e/0xc0
    >
    > other info that might help us debug this:
    > 1 lock held by rb_consumer/431:
    > #0: (&cpu_buffer->reader_lock){......}, at: [] ring_buffer_consume+0x7e/0xc0

    The ring buffer is a generic structure, and can be used outside of
    ftrace. If ftrace traces within the use of the ring buffer, it can produce
    false positives with lockdep.

    This patch passes in a static lock key into the allocation of the ring
    buffer, so that different ring buffers will have their own lock class.

    Reported-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    LKML-Reference:

    [ store key in ring buffer descriptor ]

    Signed-off-by: Steven Rostedt

    Peter Zijlstra
     

06 May, 2009

1 commit

  • The WARN_ON in the ring buffer when a commit is preempted and the
    buffer is filled by preceding writes can happen in normal operations.
    The WARN_ON makes it look like a bug, not to mention, because
    it does not stop tracing and calls printk which can also recurse, this
    is prone to deadlock (the WARN_ON is not in a position to recurse).

    This patch removes the WARN_ON and replaces it with a counter that
    can be retrieved by a tracer. This counter is called commit_overrun.

    While at it, I added a nmi_dropped counter to count any time an NMI entry
    is dropped because the NMI could not take the spinlock.

    [ Impact: prevent deadlock by printing normal case warning ]

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

24 Apr, 2009

1 commit

  • RB_MAX_SMALL_DATA = 28bytes is too small for most tracers, it wastes
    an 'u32' to save the actually length for events which data size > 28.

    This fix uses compressed event header and enlarges RB_MAX_SMALL_DATA.

    [ Impact: saves about 0%-12.5%(depends on tracer) memory in ring_buffer ]

    Signed-off-by: Lai Jiangshan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Lai Jiangshan
     

17 Apr, 2009

1 commit

  • Currently, every thing needed to read the binary output from the
    ring buffers is available, with the exception of the way the ring
    buffers handles itself internally.

    This patch creates two special files in the debugfs/tracing/events
    directory:

    # cat /debug/tracing/events/header_page
    field: u64 timestamp; offset:0; size:8;
    field: local_t commit; offset:8; size:8;
    field: char data; offset:16; size:4080;

    # cat /debug/tracing/events/header_event
    type : 2 bits
    len : 3 bits
    time_delta : 27 bits
    array : 32 bits

    padding : type == 0
    time_extend : type == 1
    data : type == 3

    This is to allow a userspace app to see if the ring buffer format changes
    or not.

    [ Impact: allow userspace apps to know of ringbuffer format changes ]

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

14 Apr, 2009

1 commit

  • The ring_buffer_discard_commit is similar to ring_buffer_event_discard
    but it can only be done on an event that has yet to be commited.
    Unpredictable results can happen otherwise.

    The main difference between ring_buffer_discard_commit and
    ring_buffer_event_discard is that ring_buffer_discard_commit will try
    to free the data in the ring buffer if nothing has addded data
    after the reserved event. If something did, then it acts almost the
    same as ring_buffer_event_discard followed by a
    ring_buffer_unlock_commit.

    Note, either ring_buffer_commit_discard and ring_buffer_unlock_commit
    can be called on an event, not both.

    This commit also exports both discard functions to be usable by
    GPL modules.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

23 Mar, 2009

1 commit

  • This patch overloads RINGBUF_TYPE_PADDING to provide a way to discard
    events from the ring buffer, for the event-filtering mechanism
    introduced in a subsequent patch.

    I did the initial version but thanks to Steven Rostedt for adding
    the parts that actually made it work. ;-)

    Signed-off-by: Tom Zanussi
    Acked-by: Frederic Weisbecker
    Signed-off-by: Ingo Molnar

    Tom Zanussi
     

18 Mar, 2009

1 commit


05 Mar, 2009

1 commit

  • Impact: cleanup

    The functions tracing_start/tracing_stop have been moved to kernel.h.
    These are not the functions a developer most likely wants to use
    when they want to insert a place to stop tracing and restart it from
    user space.

    tracing_start/tracing_stop was created to work with things like
    suspend to ram, where even calling smp_processor_id() can crash the
    system. The tracing_start/tracing_stop was used to stop the tracer from
    doing anything. These are still light weight functions, but add a bit
    more overhead to be able to stop the tracers. They also have no interface
    back to userland. That is, if the kernel calls tracing_stop, userland
    can not start tracing.

    What a developer most likely wants to use is tracing_on/tracing_off.
    These are very light weight functions (simply sets or clears a bit).
    These functions just stop recording into the ring buffer. The tracers
    don't even know that this happens except that they would receive NULL
    from the ring_buffer_lock_reserve function.

    Also, there's a way for the user land to enable or disable this bit.
    In debugfs/tracing/tracing_on, a user may echo "0" (same as tracing_off())
    or echo "1" (same as tracing_on()) into this file. This becomes handy when
    a kernel developer is debugging and wants tracing to turn off when it
    hits an anomaly. Then the developer can examine the trace, and restart
    tracing if they want to try again (echo 1 > tracing_on).

    This patch moves the prototypes for tracing_on/tracing_off to kernel.h
    and comments their use, so that a kernel developer will know how
    to use them.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

04 Mar, 2009

1 commit


17 Feb, 2009

1 commit


11 Feb, 2009

1 commit


09 Feb, 2009

1 commit


08 Feb, 2009

1 commit


06 Feb, 2009

1 commit

  • Impact: API change, cleanup

    >From ring_buffer_{lock_reserve,unlock_commit}.

    $ codiff /tmp/vmlinux.before /tmp/vmlinux.after
    linux-2.6-tip/kernel/trace/trace.c:
    trace_vprintk | -14
    trace_graph_return | -14
    trace_graph_entry | -10
    trace_function | -8
    __ftrace_trace_stack | -8
    ftrace_trace_userstack | -8
    tracing_sched_switch_trace | -8
    ftrace_trace_special | -12
    tracing_sched_wakeup_trace | -8
    9 functions changed, 90 bytes removed, diff: -90

    linux-2.6-tip/block/blktrace.c:
    __blk_add_trace | -1
    1 function changed, 1 bytes removed, diff: -1

    /tmp/vmlinux.after:
    10 functions changed, 91 bytes removed, diff: -91

    Signed-off-by: Arnaldo Carvalho de Melo
    Acked-by: Frédéric Weisbecker
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

31 Dec, 2008

1 commit

  • * 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    oprofile: select RING_BUFFER
    ring_buffer: adding EXPORT_SYMBOLs
    oprofile: fix lost sample counter
    oprofile: remove nr_available_slots()
    oprofile: port to the new ring_buffer
    ring_buffer: add remaining cpu functions to ring_buffer.h
    oprofile: moving cpu_buffer_reset() to cpu_buffer.h
    oprofile: adding cpu_buffer_entries()
    oprofile: adding cpu_buffer_write_commit()
    oprofile: adding cpu buffer r/w access functions
    ftrace: remove unused function arg in trace_iterator_increment()
    ring_buffer: update description for ring_buffer_alloc()
    oprofile: set values to default when creating oprofilefs
    oprofile: implement switch/case in buffer_sync.c
    x86/oprofile: cleanup IBS init/exit functions in op_model_amd.c
    x86/oprofile: reordering IBS code in op_model_amd.c
    oprofile: fix typo
    oprofile: whitspace changes only
    oprofile: update comment for oprofile_add_sample()
    oprofile: comment cleanup

    Linus Torvalds
     

10 Dec, 2008

1 commit


08 Dec, 2008

1 commit


03 Dec, 2008

1 commit

  • Impact: new API to ring buffer

    This patch adds a new interface into the ring buffer that allows a
    page to be read from the ring buffer on a given CPU. For every page
    read, one must also be given to allow for a "swap" of the pages.

    rpage = ring_buffer_alloc_read_page(buffer);
    if (!rpage)
    goto err;
    ret = ring_buffer_read_page(buffer, &rpage, cpu, full);
    if (!ret)
    goto empty;
    process_page(rpage);
    ring_buffer_free_read_page(rpage);

    The caller of these functions must handle any waits that are
    needed to wait for new data. The ring_buffer_read_page will simply
    return 0 if there is no data, or if "full" is set and the writer
    is still on the current page.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

23 Nov, 2008

1 commit

  • Impact: feature to permanently disable ring buffer

    This patch adds a API to the ring buffer code that will permanently
    disable the ring buffer from ever recording. This should only be
    called when some serious anomaly is detected, and the system
    may be in an unstable state. When that happens, shutting down the
    recording to the ring buffers may be appropriate.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

12 Nov, 2008

1 commit

  • Impact: enable/disable ring buffer recording API added

    Several kernel developers have requested that there be a way to stop
    recording into the ring buffers with a simple switch that can also
    be enabled from userspace. This patch addes a new kernel API to the
    ring buffers called:

    tracing_on()
    tracing_off()

    When tracing_off() is called, all ring buffers will not be able to record
    into their buffers.

    tracing_on() will enable the ring buffers again.

    These two act like an on/off switch. That is, there is no counting of the
    number of times tracing_off or tracing_on has been called.

    A new file is added to the debugfs/tracing directory called

    tracing_on

    This allows for userspace applications to also flip the switch.

    echo 0 > debugfs/tracing/tracing_on

    disables the tracing.

    echo 1 > /debugfs/tracing/tracing_on

    enables it.

    Note, this does not disable or enable any tracers. It only sets or clears
    a flag that needs to be set in order for the ring buffers to write to
    their buffers. It is a global flag, and affects all ring buffers.

    The buffers start out with tracing_on enabled.

    There are now three flags that control recording into the buffers:

    tracing_on: which affects all ring buffer tracers.

    buffer->record_disabled: which affects an allocated buffer, which may be set
    if an anomaly is detected, and tracing is disabled.

    cpu_buffer->record_disabled: which is set by tracing_stop() or if an
    anomaly is detected. tracing_start can not reenable this if
    an anomaly occurred.

    The userspace debugfs/tracing/tracing_enabled is implemented with
    tracing_stop() but the user space code can not enable it if the kernel
    called tracing_stop().

    Userspace can enable the tracing_on even if the kernel disabled it.
    It is just a switch used to stop tracing if a condition was hit.
    tracing_on is not for protecting critical areas in the kernel nor is
    it for stopping tracing if an anomaly occurred. This is because userspace
    can reenable it at any time.

    Side effect: With this patch, I discovered a dead variable in ftrace.c
    called tracing_on. This patch removes it.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

14 Oct, 2008

2 commits

  • The old "lock always" scheme had issues with lockdep, and was not very
    efficient anyways.

    This patch does a new design to be partially lockless on writes.
    Writes will add new entries to the per cpu pages by simply disabling
    interrupts. When a write needs to go to another page than it will
    grab the lock.

    A new "read page" has been added so that the reader can pull out a page
    from the ring buffer to read without worrying about the writer writing over
    it. This allows us to not take the lock for all reads. The lock is
    now only taken when a read needs to go to a new page.

    This is far from lockless, and interrupts still need to be disabled,
    but it is a step towards a more lockless solution, and it also
    solves a lot of the issues that were noticed by the first conversion
    of ftrace to the ring buffers.

    Note: the ring_buffer_{un}lock API has been removed.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • This is a unified tracing buffer that implements a ring buffer that
    hopefully everyone will eventually be able to use.

    The events recorded into the buffer have the following structure:

    struct ring_buffer_event {
    u32 type:2, len:3, time_delta:27;
    u32 array[];
    };

    The minimum size of an event is 8 bytes. All events are 4 byte
    aligned inside the buffer.

    There are 4 types (all internal use for the ring buffer, only
    the data type is exported to the interface users).

    RINGBUF_TYPE_PADDING: this type is used to note extra space at the end
    of a buffer page.

    RINGBUF_TYPE_TIME_EXTENT: This type is used when the time between events
    is greater than the 27 bit delta can hold. We add another
    32 bits, and record that in its own event (8 byte size).

    RINGBUF_TYPE_TIME_STAMP: (Not implemented yet). This will hold data to
    help keep the buffer timestamps in sync.

    RINGBUF_TYPE_DATA: The event actually holds user data.

    The "len" field is only three bits. Since the data must be
    4 byte aligned, this field is shifted left by 2, giving a
    max length of 28 bytes. If the data load is greater than 28
    bytes, the first array field holds the full length of the
    data load and the len field is set to zero.

    Example, data size of 7 bytes:

    type = RINGBUF_TYPE_DATA
    len = 2
    time_delta: -
    array[0..1]:

    This event is saved in 12 bytes of the buffer.

    An event with 82 bytes of data:

    type = RINGBUF_TYPE_DATA
    len = 0
    time_delta: -
    array[0]: 84 (Note the alignment)
    array[1..14]:

    The above event is saved in 92 bytes (if my math is correct).
    82 bytes of data, 2 bytes empty, 4 byte header, 4 byte length.

    Do not reference the above event struct directly. Use the following
    functions to gain access to the event table, since the
    ring_buffer_event structure may change in the future.

    ring_buffer_event_length(event): get the length of the event.
    This is the size of the memory used to record this
    event, and not the size of the data pay load.

    ring_buffer_time_delta(event): get the time delta of the event
    This returns the delta time stamp since the last event.
    Note: Even though this is in the header, there should
    be no reason to access this directly, accept
    for debugging.

    ring_buffer_event_data(event): get the data from the event
    This is the function to use to get the actual data
    from the event. Note, it is only a pointer to the
    data inside the buffer. This data must be copied to
    another location otherwise you risk it being written
    over in the buffer.

    ring_buffer_lock: A way to lock the entire buffer.
    ring_buffer_unlock: unlock the buffer.

    ring_buffer_alloc: create a new ring buffer. Can choose between
    overwrite or consumer/producer mode. Overwrite will
    overwrite old data, where as consumer producer will
    throw away new data if the consumer catches up with the
    producer. The consumer/producer is the default.

    ring_buffer_free: free the ring buffer.

    ring_buffer_resize: resize the buffer. Changes the size of each cpu
    buffer. Note, it is up to the caller to provide that
    the buffer is not being used while this is happening.
    This requirement may go away but do not count on it.

    ring_buffer_lock_reserve: locks the ring buffer and allocates an
    entry on the buffer to write to.
    ring_buffer_unlock_commit: unlocks the ring buffer and commits it to
    the buffer.

    ring_buffer_write: writes some data into the ring buffer.

    ring_buffer_peek: Look at a next item in the cpu buffer.
    ring_buffer_consume: get the next item in the cpu buffer and
    consume it. That is, this function increments the head
    pointer.

    ring_buffer_read_start: Start an iterator of a cpu buffer.
    For now, this disables the cpu buffer, until you issue
    a finish. This is just because we do not want the iterator
    to be overwritten. This restriction may change in the future.
    But note, this is used for static reading of a buffer which
    is usually done "after" a trace. Live readings would want
    to use the ring_buffer_consume above, which will not
    disable the ring buffer.

    ring_buffer_read_finish: Finishes the read iterator and reenables
    the ring buffer.

    ring_buffer_iter_peek: Look at the next item in the cpu iterator.
    ring_buffer_read: Read the iterator and increment it.
    ring_buffer_iter_reset: Reset the iterator to point to the beginning
    of the cpu buffer.
    ring_buffer_iter_empty: Returns true if the iterator is at the end
    of the cpu buffer.

    ring_buffer_size: returns the size in bytes of each cpu buffer.
    Note, the real size is this times the number of CPUs.

    ring_buffer_reset_cpu: Sets the cpu buffer to empty
    ring_buffer_reset: sets all cpu buffers to empty

    ring_buffer_swap_cpu: swaps a cpu buffer from one buffer with a
    cpu buffer of another buffer. This is handy when you
    want to take a snap shot of a running trace on just one
    cpu. Having a backup buffer, to swap with facilitates this.
    Ftrace max latencies use this.

    ring_buffer_empty: Returns true if the ring buffer is empty.
    ring_buffer_empty_cpu: Returns true if the cpu buffer is empty.

    ring_buffer_record_disable: disable all cpu buffers (read only)
    ring_buffer_record_disable_cpu: disable a single cpu buffer (read only)
    ring_buffer_record_enable: enable all cpu buffers.
    ring_buffer_record_enabl_cpu: enable a single cpu buffer.

    ring_buffer_entries: The number of entries in a ring buffer.
    ring_buffer_overruns: The number of entries removed due to writing wrap.

    ring_buffer_time_stamp: Get the time stamp used by the ring buffer
    ring_buffer_normalize_time_stamp: normalize the ring buffer time stamp
    into nanosecs.

    I still need to implement the GTOD feature. But we need support from
    the cpu frequency infrastructure. But this can be done at a later
    time without affecting the ring buffer interface.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt