03 Nov, 2015

1 commit


11 Nov, 2014

1 commit

  • On a !PREEMPT kernel, attempting to use trace-cmd results in a soft
    lockup:

    # trace-cmd record -e raw_syscalls:* -F false
    NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trace-cmd:61]
    ...
    Call Trace:
    [] ? __wake_up_common+0x90/0x90
    [] wait_on_pipe+0x35/0x40
    [] tracing_buffers_splice_read+0x2e3/0x3c0
    [] ? tracing_stats_read+0x2a0/0x2a0
    [] ? _raw_spin_unlock+0x2b/0x40
    [] ? do_read_fault+0x21b/0x290
    [] ? handle_mm_fault+0x2ba/0xbd0
    [] ? trace_event_buffer_lock_reserve+0x40/0x80
    [] ? trace_buffer_lock_reserve+0x22/0x60
    [] ? trace_event_buffer_lock_reserve+0x40/0x80
    [] do_splice_to+0x6d/0x90
    [] SyS_splice+0x7c1/0x800
    [] tracesys_phase2+0xd3/0xd8

    The problem is this: tracing_buffers_splice_read() calls
    ring_buffer_wait() to wait for data in the ring buffers. The buffers
    are not empty so ring_buffer_wait() returns immediately. But
    tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1,
    meaning it only wants to read a full page. When the full page is not
    available, tracing_buffers_splice_read() tries to wait again with
    ring_buffer_wait(), which again returns immediately, and so on.

    Fix this by adding a "full" argument to ring_buffer_wait() which will
    make ring_buffer_wait() wait until the writer has left the reader's
    page, i.e. until full-page reads will succeed.

    Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in

    Cc: stable@vger.kernel.org # 3.16+
    Fixes: b1169cc69ba9 ("tracing: Remove mock up poll wait function")
    Signed-off-by: Rabin Vincent
    Signed-off-by: Steven Rostedt

    Rabin Vincent
     

10 Jun, 2014

1 commit

  • The per_cpu buffers are created one per possible CPU. But these do
    not mean that those CPUs are online, nor do they even exist.

    With the addition of the ring buffer polling, it assumes that the
    caller polls on an existing buffer. But this is not the case if
    the user reads trace_pipe from a CPU that does not exist, and this
    causes the kernel to crash.

    Simple fix is to check the cpu against buffer bitmask against to see
    if the buffer was allocated or not and return -ENODEV if it is
    not.

    More updates were done to pass the -ENODEV back up to userspace.

    Link: http://lkml.kernel.org/r/5393DB61.6060707@oracle.com

    Reported-by: Sasha Levin
    Cc: stable@vger.kernel.org # 3.10+
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

15 Mar, 2013

1 commit


31 Jan, 2013

1 commit


02 Nov, 2012

1 commit


01 Nov, 2012

1 commit

  • The existing 'overrun' counter is incremented when the ring
    buffer wraps around, with overflow on (the default). We wanted
    a way to count requests lost from the buffer filling up with
    overflow off, too. I decided to add a new counter instead
    of retro-fitting the existing one because it seems like a
    different statistic to count conceptually, and also because
    of how the code was structured.

    Link: http://lkml.kernel.org/r/1310765038-26399-1-git-send-email-slavapestov@google.com

    Signed-off-by: Slava Pestov
    Signed-off-by: Steven Rostedt

    Slava Pestov
     

24 Apr, 2012

1 commit

  • Add a debugfs entry under per_cpu/ folder for each cpu called
    buffer_size_kb to control the ring buffer size for each CPU
    independently.

    If the global file buffer_size_kb is used to set size, the individual
    ring buffers will be adjusted to the given size. The buffer_size_kb will
    report the common size to maintain backward compatibility.

    If the buffer_size_kb file under the per_cpu/ directory is used to
    change buffer size for a specific CPU, only the size of the respective
    ring buffer is updated. When tracing/buffer_size_kb is read, it reports
    'X' to indicate that sizes of per_cpu ring buffers are not equivalent.

    Link: http://lkml.kernel.org/r/1328212844-11889-1-git-send-email-vnagarnaik@google.com

    Cc: Frederic Weisbecker
    Cc: Michael Rubin
    Cc: David Sharp
    Cc: Justin Teravest
    Signed-off-by: Vaibhav Nagarnaik
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     

23 Feb, 2012

1 commit

  • As the ring-buffer code is being used by other facilities in the
    kernel, having tracing_on file disable *all* buffers is not a desired
    affect. It should only disable the ftrace buffers that are being used.

    Move the code into the trace.c file and use the buffer disabling
    for tracing_on() and tracing_off(). This way only the ftrace buffers
    will be affected by them and other kernel utilities will not be
    confused to why their output suddenly stopped.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

31 Aug, 2011

1 commit

  • The stats file under per_cpu folder provides the number of entries,
    overruns and other statistics about the CPU ring buffer. However, the
    numbers do not provide any indication of how full the ring buffer is in
    bytes compared to the overall size in bytes. Also, it is helpful to know
    the rate at which the cpu buffer is filling up.

    This patch adds an entry "bytes: " in printed stats for per_cpu ring
    buffer which provides the actual bytes consumed in the ring buffer. This
    field includes the number of bytes used by recorded events and the
    padding bytes added when moving the tail pointer to next page.

    It also adds the following time stamps:
    "oldest event ts:" - the oldest timestamp in the ring buffer
    "now ts:" - the timestamp at the time of reading

    The field "now ts" provides a consistent time snapshot to the userspace
    when being read. This is read from the same trace clock used by tracing
    event timestamps.

    Together, these values provide the rate at which the buffer is filling
    up, from the formula:
    bytes / (now_ts - oldest_event_ts)

    Signed-off-by: Vaibhav Nagarnaik
    Cc: Michael Rubin
    Cc: David Sharp
    Link: http://lkml.kernel.org/r/1313531179-9323-3-git-send-email-vnagarnaik@google.com
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     

15 Jun, 2011

1 commit

  • The tracing ring buffer is a group of per-cpu ring buffers where
    allocation and logging is done on a per-cpu basis. The events that are
    generated on a particular CPU are logged in the corresponding buffer.
    This is to provide wait-free writes between CPUs and good NUMA node
    locality while accessing the ring buffer.

    However, the allocation routines consider NUMA locality only for buffer
    page metadata and not for the actual buffer page. This causes the pages
    to be allocated on the NUMA node local to the CPU where the allocation
    routine is running at the time.

    This patch fixes the problem by using a NUMA node specific allocation
    routine so that the pages are allocated from a NUMA node local to the
    logging CPU.

    I tested with the getuid_microbench from autotest. It is a simple binary
    that calls getuid() in a loop and measures the average time for the
    syscall to complete. The following command was used to test:
    $ getuid_microbench 1000000

    Compared the numbers found on kernel with and without this patch and
    found that logging latency decreases by 30-50 ns/call.
    tracing with non-NUMA allocation - 569 ns/call
    tracing with NUMA allocation - 512 ns/call

    Signed-off-by: Vaibhav Nagarnaik
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Michael Rubin
    Cc: David Sharp
    Link: http://lkml.kernel.org/r/1304470602-20366-1-git-send-email-vnagarnaik@google.com
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     

10 Mar, 2011

1 commit

  • Add an "overwrite" trace_option for ftrace to control whether the buffer should
    be overwritten on overflow or not. The default remains to overwrite old events
    when the buffer is full. This patch adds the option to instead discard newest
    events when the buffer is full. This is useful to get a snapshot of traces just
    after enabling traces. Dropping the current event is also a simpler code path.

    Signed-off-by: David Sharp
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    David Sharp
     

21 Oct, 2010

1 commit


28 Apr, 2010

1 commit

  • When performing a non-consuming read, a synchronize_sched() is
    performed once for every cpu which is actively tracing.

    This is very expensive, and can make it take several seconds to open
    up the 'trace' file with lots of cpus.

    Only one synchronize_sched() call is actually necessary. What is
    desired is for all cpus to see the disabling state change. So we
    transform the existing sequence:

    for_each_cpu() {
    ring_buffer_read_start();
    }

    where each ring_buffer_start() call performs a synchronize_sched(),
    into the following:

    for_each_cpu() {
    ring_buffer_read_prepare();
    }
    ring_buffer_read_prepare_sync();
    for_each_cpu() {
    ring_buffer_read_start();
    }

    wherein only the single ring_buffer_read_prepare_sync() call needs to
    do the synchronize_sched().

    The first phase, via ring_buffer_read_prepare(), allocates the 'iter'
    memory and increments ->record_disabled.

    In the second phase, ring_buffer_read_prepare_sync() makes sure this
    ->record_disabled state is visible fully to all cpus.

    And in the final third phase, the ring_buffer_read_start() calls reset
    the 'iter' objects allocated in the first phase since we now know that
    none of the cpus are adding trace entries any more.

    This makes openning the 'trace' file nearly instantaneous on a
    sparc64 Niagara2 box with 128 cpus tracing.

    Signed-off-by: David S. Miller
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    David Miller
     

01 Apr, 2010

1 commit

  • Currently, when the ring buffer drops events, it does not record
    the fact that it did so. It does inform the writer that the event
    was dropped by returning a NULL event, but it does not put in any
    place holder where the event was dropped.

    This is not a trivial thing to add because the ring buffer mostly
    runs in overwrite (flight recorder) mode. That is, when the ring
    buffer is full, new data will overwrite old data.

    In a produce/consumer mode, where new data is simply dropped when
    the ring buffer is full, it is trivial to add the placeholder
    for dropped events. When there's more room to write new data, then
    a special event can be added to notify the reader about the dropped
    events.

    But in overwrite mode, any new write can overwrite events. A place
    holder can not be inserted into the ring buffer since there never
    may be room. A reader could also come in at anytime and miss the
    placeholder.

    Luckily, the way the ring buffer works, the read side can find out
    if events were lost or not, and how many events. Everytime a write
    takes place, if it overwrites the header page (the next read) it
    updates a "overrun" variable that keeps track of the number of
    lost events. When a reader swaps out a page from the ring buffer,
    it can record this number, perfom the swap, and then check to
    see if the number changed, and take the diff if it has, which would be
    the number of events dropped. This can be stored by the reader
    and returned to callers of the reader.

    Since the reader page swap will fail if the writer moved the head
    page since the time the reader page set up the swap, this gives room
    to record the overruns without worrying about races. If the reader
    sets up the pages, records the overrun, than performs the swap,
    if the swap succeeds, then the overrun variable has not been
    updated since the setup before the swap.

    For binary readers of the ring buffer, a flag is set in the header
    of each sub page (sub buffer) of the ring buffer. This flag is embedded
    in the size field of the data on the sub buffer, in the 31st bit (the size
    can be 32 or 64 bits depending on the architecture), but only 27
    bits needs to be used for the actual size (less actually).

    We could add a new field in the sub buffer header to also record the
    number of events dropped since the last read, but this will change the
    format of the binary ring buffer a bit too much. Perhaps this change can
    be made if the information on the number of events dropped is considered
    important enough.

    Note, the notification of dropped events is only used by consuming reads
    or peeking at the ring buffer. Iterating over the ring buffer does not
    keep this information because the necessary data is only available when
    a page swap is made, and the iterator does not swap out pages.

    Cc: Robert Richter
    Cc: Andi Kleen
    Cc: Li Zefan
    Cc: Arnaldo Carvalho de Melo
    Cc: "Luis Claudio R. Goncalves"
    Cc: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

05 Sep, 2009

1 commit

  • Since the ability to swap the cpu buffers adds a small overhead to
    the recording of a trace, we only want to add it when needed.

    Only the irqsoff and preemptoff tracers use this feature, and both are
    not recommended for production kernels. This patch disables its use
    when neither irqsoff nor preemptoff is configured.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

04 Sep, 2009

1 commit

  • The function ring_buffer_event_discard can be used on any item in the
    ring buffer, even after the item was committed. This function provides
    no safety nets and is very race prone.

    An item may be safely removed from the ring buffer before it is committed
    with the ring_buffer_discard_commit.

    Since there are currently no users of this function, and because this
    function is racey and error prone, this patch removes it altogether.

    Note, removing this function also allows the counters to ignore
    all discarded events (patches will follow).

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

08 Jul, 2009

1 commit

  • This patch converts the ring buffers into a completely lockless
    buffer recording system. The read side still takes locks since
    we still serialize readers. But the writers are the ones that
    must be lockless (those can happen in NMIs).

    The main change is to the "head_page" pointer. We write to the
    tail, and read from the head. The "head_page" pointer in the cpu
    buffer is now just a reference to where to look. The real head
    page is now kept in the head_page->list->prev->next pointer.
    That is, in the list head of the previous page we set flags.

    The list pages are allocated to be aligned such that the lowest
    significant bits are always zero pointing to the list. This gives
    us play to put in flags to their pointers.

    bit 0: set when the page is a head page
    bit 1: set when the writer is moving the page (for overwrite mode)

    cmpxchg is used to update the pointer.

    When the writer wraps the buffer and the tail meets the head,
    in overwrite mode, the writer must move the head page forward.
    It first uses cmpxchg to change the pointer flag from 1 to 2.
    Once this is done, the reader on another CPU will not take the
    page from the buffer.

    The writers need to protect against interrupts (we don't bother with
    disabling interrupts because NMIs are allowed to write too).

    After the writer sets the pointer flag to 2, it takes care to
    manage interrupts coming in. This is discribed in detail within the
    comments of the code.

    Changes in version 2:
    - Let reader reset entries value of header page.
    - Fix tail page passing commit page on reader page test.
    - Always increment entries and write counter in rb_tail_page_update
    - Add safety check in rb_set_commit_to_write to break out of infinite loop
    - add mask in rb_is_reader_page

    [ Impact: lock free writing to the ring buffer ]

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

15 Jun, 2009

1 commit


09 Jun, 2009

1 commit

  • On Sun, 7 Jun 2009, Ingo Molnar wrote:
    > Testing tracer sched_switch: Starting ring buffer hammer
    > PASSED
    > Testing tracer sysprof: PASSED
    > Testing tracer function: PASSED
    > Testing tracer irqsoff:
    > =============================================
    > PASSED
    > Testing tracer preemptoff: PASSED
    > Testing tracer preemptirqsoff: [ INFO: possible recursive locking detected ]
    > PASSED
    > Testing tracer branch: 2.6.30-rc8-tip-01972-ge5b9078-dirty #5760
    > ---------------------------------------------
    > rb_consumer/431 is trying to acquire lock:
    > (&cpu_buffer->reader_lock){......}, at: [] ring_buffer_reset_cpu+0x37/0x70
    >
    > but task is already holding lock:
    > (&cpu_buffer->reader_lock){......}, at: [] ring_buffer_consume+0x7e/0xc0
    >
    > other info that might help us debug this:
    > 1 lock held by rb_consumer/431:
    > #0: (&cpu_buffer->reader_lock){......}, at: [] ring_buffer_consume+0x7e/0xc0

    The ring buffer is a generic structure, and can be used outside of
    ftrace. If ftrace traces within the use of the ring buffer, it can produce
    false positives with lockdep.

    This patch passes in a static lock key into the allocation of the ring
    buffer, so that different ring buffers will have their own lock class.

    Reported-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    LKML-Reference:

    [ store key in ring buffer descriptor ]

    Signed-off-by: Steven Rostedt

    Peter Zijlstra
     

06 May, 2009

1 commit

  • The WARN_ON in the ring buffer when a commit is preempted and the
    buffer is filled by preceding writes can happen in normal operations.
    The WARN_ON makes it look like a bug, not to mention, because
    it does not stop tracing and calls printk which can also recurse, this
    is prone to deadlock (the WARN_ON is not in a position to recurse).

    This patch removes the WARN_ON and replaces it with a counter that
    can be retrieved by a tracer. This counter is called commit_overrun.

    While at it, I added a nmi_dropped counter to count any time an NMI entry
    is dropped because the NMI could not take the spinlock.

    [ Impact: prevent deadlock by printing normal case warning ]

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

24 Apr, 2009

1 commit

  • RB_MAX_SMALL_DATA = 28bytes is too small for most tracers, it wastes
    an 'u32' to save the actually length for events which data size > 28.

    This fix uses compressed event header and enlarges RB_MAX_SMALL_DATA.

    [ Impact: saves about 0%-12.5%(depends on tracer) memory in ring_buffer ]

    Signed-off-by: Lai Jiangshan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Lai Jiangshan
     

17 Apr, 2009

1 commit

  • Currently, every thing needed to read the binary output from the
    ring buffers is available, with the exception of the way the ring
    buffers handles itself internally.

    This patch creates two special files in the debugfs/tracing/events
    directory:

    # cat /debug/tracing/events/header_page
    field: u64 timestamp; offset:0; size:8;
    field: local_t commit; offset:8; size:8;
    field: char data; offset:16; size:4080;

    # cat /debug/tracing/events/header_event
    type : 2 bits
    len : 3 bits
    time_delta : 27 bits
    array : 32 bits

    padding : type == 0
    time_extend : type == 1
    data : type == 3

    This is to allow a userspace app to see if the ring buffer format changes
    or not.

    [ Impact: allow userspace apps to know of ringbuffer format changes ]

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

14 Apr, 2009

1 commit

  • The ring_buffer_discard_commit is similar to ring_buffer_event_discard
    but it can only be done on an event that has yet to be commited.
    Unpredictable results can happen otherwise.

    The main difference between ring_buffer_discard_commit and
    ring_buffer_event_discard is that ring_buffer_discard_commit will try
    to free the data in the ring buffer if nothing has addded data
    after the reserved event. If something did, then it acts almost the
    same as ring_buffer_event_discard followed by a
    ring_buffer_unlock_commit.

    Note, either ring_buffer_commit_discard and ring_buffer_unlock_commit
    can be called on an event, not both.

    This commit also exports both discard functions to be usable by
    GPL modules.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

23 Mar, 2009

1 commit

  • This patch overloads RINGBUF_TYPE_PADDING to provide a way to discard
    events from the ring buffer, for the event-filtering mechanism
    introduced in a subsequent patch.

    I did the initial version but thanks to Steven Rostedt for adding
    the parts that actually made it work. ;-)

    Signed-off-by: Tom Zanussi
    Acked-by: Frederic Weisbecker
    Signed-off-by: Ingo Molnar

    Tom Zanussi
     

18 Mar, 2009

1 commit


05 Mar, 2009

1 commit

  • Impact: cleanup

    The functions tracing_start/tracing_stop have been moved to kernel.h.
    These are not the functions a developer most likely wants to use
    when they want to insert a place to stop tracing and restart it from
    user space.

    tracing_start/tracing_stop was created to work with things like
    suspend to ram, where even calling smp_processor_id() can crash the
    system. The tracing_start/tracing_stop was used to stop the tracer from
    doing anything. These are still light weight functions, but add a bit
    more overhead to be able to stop the tracers. They also have no interface
    back to userland. That is, if the kernel calls tracing_stop, userland
    can not start tracing.

    What a developer most likely wants to use is tracing_on/tracing_off.
    These are very light weight functions (simply sets or clears a bit).
    These functions just stop recording into the ring buffer. The tracers
    don't even know that this happens except that they would receive NULL
    from the ring_buffer_lock_reserve function.

    Also, there's a way for the user land to enable or disable this bit.
    In debugfs/tracing/tracing_on, a user may echo "0" (same as tracing_off())
    or echo "1" (same as tracing_on()) into this file. This becomes handy when
    a kernel developer is debugging and wants tracing to turn off when it
    hits an anomaly. Then the developer can examine the trace, and restart
    tracing if they want to try again (echo 1 > tracing_on).

    This patch moves the prototypes for tracing_on/tracing_off to kernel.h
    and comments their use, so that a kernel developer will know how
    to use them.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

04 Mar, 2009

1 commit


17 Feb, 2009

1 commit


11 Feb, 2009

1 commit


09 Feb, 2009

1 commit


08 Feb, 2009

1 commit


06 Feb, 2009

1 commit

  • Impact: API change, cleanup

    >From ring_buffer_{lock_reserve,unlock_commit}.

    $ codiff /tmp/vmlinux.before /tmp/vmlinux.after
    linux-2.6-tip/kernel/trace/trace.c:
    trace_vprintk | -14
    trace_graph_return | -14
    trace_graph_entry | -10
    trace_function | -8
    __ftrace_trace_stack | -8
    ftrace_trace_userstack | -8
    tracing_sched_switch_trace | -8
    ftrace_trace_special | -12
    tracing_sched_wakeup_trace | -8
    9 functions changed, 90 bytes removed, diff: -90

    linux-2.6-tip/block/blktrace.c:
    __blk_add_trace | -1
    1 function changed, 1 bytes removed, diff: -1

    /tmp/vmlinux.after:
    10 functions changed, 91 bytes removed, diff: -91

    Signed-off-by: Arnaldo Carvalho de Melo
    Acked-by: Frédéric Weisbecker
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     

31 Dec, 2008

1 commit

  • * 'oprofile-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    oprofile: select RING_BUFFER
    ring_buffer: adding EXPORT_SYMBOLs
    oprofile: fix lost sample counter
    oprofile: remove nr_available_slots()
    oprofile: port to the new ring_buffer
    ring_buffer: add remaining cpu functions to ring_buffer.h
    oprofile: moving cpu_buffer_reset() to cpu_buffer.h
    oprofile: adding cpu_buffer_entries()
    oprofile: adding cpu_buffer_write_commit()
    oprofile: adding cpu buffer r/w access functions
    ftrace: remove unused function arg in trace_iterator_increment()
    ring_buffer: update description for ring_buffer_alloc()
    oprofile: set values to default when creating oprofilefs
    oprofile: implement switch/case in buffer_sync.c
    x86/oprofile: cleanup IBS init/exit functions in op_model_amd.c
    x86/oprofile: reordering IBS code in op_model_amd.c
    oprofile: fix typo
    oprofile: whitspace changes only
    oprofile: update comment for oprofile_add_sample()
    oprofile: comment cleanup

    Linus Torvalds
     

10 Dec, 2008

1 commit


08 Dec, 2008

1 commit


03 Dec, 2008

1 commit

  • Impact: new API to ring buffer

    This patch adds a new interface into the ring buffer that allows a
    page to be read from the ring buffer on a given CPU. For every page
    read, one must also be given to allow for a "swap" of the pages.

    rpage = ring_buffer_alloc_read_page(buffer);
    if (!rpage)
    goto err;
    ret = ring_buffer_read_page(buffer, &rpage, cpu, full);
    if (!ret)
    goto empty;
    process_page(rpage);
    ring_buffer_free_read_page(rpage);

    The caller of these functions must handle any waits that are
    needed to wait for new data. The ring_buffer_read_page will simply
    return 0 if there is no data, or if "full" is set and the writer
    is still on the current page.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

23 Nov, 2008

1 commit

  • Impact: feature to permanently disable ring buffer

    This patch adds a API to the ring buffer code that will permanently
    disable the ring buffer from ever recording. This should only be
    called when some serious anomaly is detected, and the system
    may be in an unstable state. When that happens, shutting down the
    recording to the ring buffers may be appropriate.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     

12 Nov, 2008

1 commit

  • Impact: enable/disable ring buffer recording API added

    Several kernel developers have requested that there be a way to stop
    recording into the ring buffers with a simple switch that can also
    be enabled from userspace. This patch addes a new kernel API to the
    ring buffers called:

    tracing_on()
    tracing_off()

    When tracing_off() is called, all ring buffers will not be able to record
    into their buffers.

    tracing_on() will enable the ring buffers again.

    These two act like an on/off switch. That is, there is no counting of the
    number of times tracing_off or tracing_on has been called.

    A new file is added to the debugfs/tracing directory called

    tracing_on

    This allows for userspace applications to also flip the switch.

    echo 0 > debugfs/tracing/tracing_on

    disables the tracing.

    echo 1 > /debugfs/tracing/tracing_on

    enables it.

    Note, this does not disable or enable any tracers. It only sets or clears
    a flag that needs to be set in order for the ring buffers to write to
    their buffers. It is a global flag, and affects all ring buffers.

    The buffers start out with tracing_on enabled.

    There are now three flags that control recording into the buffers:

    tracing_on: which affects all ring buffer tracers.

    buffer->record_disabled: which affects an allocated buffer, which may be set
    if an anomaly is detected, and tracing is disabled.

    cpu_buffer->record_disabled: which is set by tracing_stop() or if an
    anomaly is detected. tracing_start can not reenable this if
    an anomaly occurred.

    The userspace debugfs/tracing/tracing_enabled is implemented with
    tracing_stop() but the user space code can not enable it if the kernel
    called tracing_stop().

    Userspace can enable the tracing_on even if the kernel disabled it.
    It is just a switch used to stop tracing if a condition was hit.
    tracing_on is not for protecting critical areas in the kernel nor is
    it for stopping tracing if an anomaly occurred. This is because userspace
    can reenable it at any time.

    Side effect: With this patch, I discovered a dead variable in ftrace.c
    called tracing_on. This patch removes it.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

14 Oct, 2008

1 commit

  • The old "lock always" scheme had issues with lockdep, and was not very
    efficient anyways.

    This patch does a new design to be partially lockless on writes.
    Writes will add new entries to the per cpu pages by simply disabling
    interrupts. When a write needs to go to another page than it will
    grab the lock.

    A new "read page" has been added so that the reader can pull out a page
    from the ring buffer to read without worrying about the writer writing over
    it. This allows us to not take the lock for all reads. The lock is
    now only taken when a read needs to go to a new page.

    This is far from lockless, and interrupts still need to be disabled,
    but it is a step towards a more lockless solution, and it also
    solves a lot of the issues that were noticed by the first conversion
    of ftrace to the ring buffers.

    Note: the ring_buffer_{un}lock API has been removed.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt