12 Oct, 2012

1 commit

  • With a system where, num_present_cpus < num_possible_cpus, even if all
    CPUs are online, non-present CPUs don't have per_cpu buffers allocated.
    If per_cpu//buffer_size_kb is modified for such a CPU, it can cause
    a panic due to NULL dereference in ring_buffer_resize().

    To fix this, resize operation is allowed only if the per-cpu buffer has
    been initialized.

    Link: http://lkml.kernel.org/r/1349912427-6486-1-git-send-email-vnagarnaik@google.com

    Cc: stable@vger.kernel.org # 3.5+
    Signed-off-by: Vaibhav Nagarnaik
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     

07 Aug, 2012

1 commit


18 Jul, 2012

1 commit


30 Jun, 2012

2 commits

  • When removing pages from the ring buffer, its state is not reset. This
    means that the counters need to be correctly updated to account for the
    pages removed.

    Update the overrun counter to reflect the removed events from the pages.

    Link: http://lkml.kernel.org/r/1340998301-1715-1-git-send-email-vnagarnaik@google.com

    Cc: Justin Teravest
    Cc: David Sharp
    Signed-off-by: Vaibhav Nagarnaik
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     
  • The new_pages list head in the cpu_buffer is not initialized. When
    adding pages to the ring buffer, if the memory allocation fails in
    ring_buffer_resize, the clean up handler tries to free up the allocated
    pages from all the cpu buffers. The panic is caused by referencing the
    uninitialized new_pages list head.

    Initializing the new_pages list head in rb_allocate_cpu_buffer fixes
    this.

    Link: http://lkml.kernel.org/r/1340391005-10880-1-git-send-email-vnagarnaik@google.com

    Cc: Justin Teravest
    Cc: David Sharp
    Signed-off-by: Vaibhav Nagarnaik
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     

29 Jun, 2012

1 commit

  • The ring buffer reader page is used to swap a page from the writable
    ring buffer. If the writer happens to be on that page, it ends up on the
    reader page, but will simply move off of it, back into the writable ring
    buffer as writes are added.

    The time stamp passed back to the readers is stored in the cpu_buffer per
    CPU descriptor. This stamp is updated when a swap of the reader page takes
    place, and it reads the current stamp from the page taken from the writable
    ring buffer. Everytime a writer goes to a new page, it updates the time stamp
    of that page.

    The problem happens if a reader reads a page from an empty per CPU ring buffer.
    If the buffer is empty, the swap still takes place, placing the writer at the
    start of the reader page. If at a later time, a write happens, it updates the
    page's time stamp and continues. But the problem is that the read_stamp does
    not get updated, because the page was already swapped.

    The solution to this was to not swap the page if the ring buffer happens to
    be empty. This also removes the side effect that the writes on the reader
    page will not get updated because the writer never gets back on the reader
    page without a swap. That is, if a read happens on an empty buffer, but then
    no reads happen for a while. If a swap took place, and the writer were to start
    writing a lot of data (function tracer), it will start overflowing the ring buffer
    and overwrite the older data. But because the writer never goes back onto the
    reader page, the data left on the reader page never gets overwritten. This
    causes the reader to see really old data, followed by a jump to newer data.

    Link: http://lkml.kernel.org/r/1340060577-9112-1-git-send-email-dhsharp@google.com
    Google-Bug-Id: 6410455
    Reported-by: David Sharp
    tested-by: David Sharp
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

24 May, 2012

1 commit

  • On some machines the number of possible CPUS is not the same as the
    number of CPUs that is on the machine. Ftrace uses possible_cpus to
    update the tracing structures but the ring buffer only allocates
    per cpu buffers for online CPUs when they come up.

    When the wakeup tracer was enabled in such a case, the ftrace code
    enabled all possible cpu buffers, but the code in ring_buffer_resize()
    did not check to see if the buffer in question was allocated. Since
    boot up CPUs did not match possible CPUs it caused the following
    crash:

    BUG: unable to handle kernel NULL pointer dereference at 00000020
    IP: [] ring_buffer_resize+0x16a/0x28d
    *pde = 00000000
    Oops: 0000 [#1] PREEMPT SMP
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in: [last unloaded: scsi_wait_scan]

    Pid: 1387, comm: bash Not tainted 3.4.0-test+ #13 /DG965MQ
    EIP: 0060:[] EFLAGS: 00010217 CPU: 0
    EIP is at ring_buffer_resize+0x16a/0x28d
    EAX: f5a14340 EBX: f6026b80 ECX: 00000ff4 EDX: 00000ff3
    ESI: 00000000 EDI: 00000002 EBP: f4275ecc ESP: f4275eb0
    DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    CR0: 80050033 CR2: 00000020 CR3: 34396000 CR4: 000007d0
    DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    DR6: ffff0ff0 DR7: 00000400
    Process bash (pid: 1387, ti=f4274000 task=f4380cb0 task.ti=f4274000)
    Stack:
    c109cf9a f6026b98 00000162 00160f68 00000006 00160f68 00000002 f4275ef0
    c109d013 f4275ee8 c123b72a c1c0bf00 c1cc81dc 00000005 f4275f98 00000007
    f4275f70 c109d0c7 7700000e 75656b61 00000070 f5e90900 f5c4e198 00000301
    Call Trace:
    [] ? tracing_set_tracer+0x115/0x1e9
    [] tracing_set_tracer+0x18e/0x1e9
    [] ? _copy_from_user+0x30/0x46
    [] tracing_set_trace_write+0x59/0x7f
    [] ? fput+0x18/0x1c6
    [] ? security_file_permission+0x27/0x2b
    [] ? rw_verify_area+0xcf/0xf2
    [] ? fput+0x18/0x1c6
    [] ? tracing_set_tracer+0x1e9/0x1e9
    [] vfs_write+0x8b/0xe3
    [] ? fget_light+0x30/0x81
    [] sys_write+0x42/0x63
    [] sysenter_do_call+0x12/0x28

    This happens with the latency tracer as the ftrace code updates the
    saved max buffer via its cpumask and not with a global setting.

    Adding a check in ring_buffer_resize() to make sure the buffer being resized
    exists, fixes the problem.

    Cc: Vaibhav Nagarnaik
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

19 May, 2012

1 commit

  • There are 2 separate loops to resize cpu buffers that are online and
    offline. Merge them to make the code look better.

    Also change the name from update_completion to update_done to allow
    shorter lines.

    Link: http://lkml.kernel.org/r/1337372991-14783-1-git-send-email-vnagarnaik@google.com

    Cc: Laurent Chavey
    Cc: Justin Teravest
    Cc: David Sharp
    Signed-off-by: Vaibhav Nagarnaik
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     

17 May, 2012

4 commits

  • When the ring buffer does its consistency test on itself, it
    removes the head page, runs the tests, and then adds it back
    to what the "head_page" pointer was. But because the head_page
    pointer may lack behind the real head page (held by the link
    list pointer). The reset may be incorrect.

    Instead, if the head_page exists (it does not on first allocation)
    reset it back to the real head page before running the consistency
    tests. Then it will be put back to its original location after
    the tests are complete.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • There use to be ring buffer integrity checks after updating the
    size of the ring buffer. But now that the ring buffer can modify
    the size while the system is running, the integrity checks were
    removed, as they require the ring buffer to be disabed to perform
    the check.

    Move the integrity check to the reading of the ring buffer via the
    iterator reads (the "trace" file). As reading via an iterator requires
    disabling the ring buffer, it is a perfect place to have it.

    If the ring buffer happens to be disabled when updating the size,
    we still perform the integrity check.

    Cc: Vaibhav Nagarnaik
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • This patch adds the capability to add new pages to a ring buffer
    atomically while write operations are going on. This makes it possible
    to expand the ring buffer size without reinitializing the ring buffer.

    The new pages are attached between the head page and its previous page.

    Link: http://lkml.kernel.org/r/1336096792-25373-2-git-send-email-vnagarnaik@google.com

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Laurent Chavey
    Cc: Justin Teravest
    Cc: David Sharp
    Signed-off-by: Vaibhav Nagarnaik
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     
  • This patch adds the capability to remove pages from a ring buffer
    without destroying any existing data in it.

    This is done by removing the pages after the tail page. This makes sure
    that first all the empty pages in the ring buffer are removed. If the
    head page is one in the list of pages to be removed, then the page after
    the removed ones is made the head page. This removes the oldest data
    from the ring buffer and keeps the latest data around to be read.

    To do this in a non-racey manner, tracing is stopped for a very short
    time while the pages to be removed are identified and unlinked from the
    ring buffer. The pages are freed after the tracing is restarted to
    minimize the time needed to stop tracing.

    The context in which the pages from the per-cpu ring buffer are removed
    runs on the respective CPU. This minimizes the events not traced to only
    NMI trace contexts.

    Link: http://lkml.kernel.org/r/1336096792-25373-1-git-send-email-vnagarnaik@google.com

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Laurent Chavey
    Cc: Justin Teravest
    Cc: David Sharp
    Signed-off-by: Vaibhav Nagarnaik
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     

24 Apr, 2012

1 commit

  • Add a debugfs entry under per_cpu/ folder for each cpu called
    buffer_size_kb to control the ring buffer size for each CPU
    independently.

    If the global file buffer_size_kb is used to set size, the individual
    ring buffers will be adjusted to the given size. The buffer_size_kb will
    report the common size to maintain backward compatibility.

    If the buffer_size_kb file under the per_cpu/ directory is used to
    change buffer size for a specific CPU, only the size of the respective
    ring buffer is updated. When tracing/buffer_size_kb is read, it reports
    'X' to indicate that sizes of per_cpu ring buffers are not equivalent.

    Link: http://lkml.kernel.org/r/1328212844-11889-1-git-send-email-vnagarnaik@google.com

    Cc: Frederic Weisbecker
    Cc: Michael Rubin
    Cc: David Sharp
    Cc: Justin Teravest
    Signed-off-by: Vaibhav Nagarnaik
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     

23 Feb, 2012

1 commit

  • As the ring-buffer code is being used by other facilities in the
    kernel, having tracing_on file disable *all* buffers is not a desired
    affect. It should only disable the ftrace buffers that are being used.

    Move the code into the trace.c file and use the buffer disabling
    for tracing_on() and tracing_off(). This way only the ftrace buffers
    will be affected by them and other kernel utilities will not be
    confused to why their output suddenly stopped.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

26 Oct, 2011

1 commit

  • * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (121 commits)
    perf symbols: Increase symbol KSYM_NAME_LEN size
    perf hists browser: Refuse 'a' hotkey on non symbolic views
    perf ui browser: Use libslang to read keys
    perf tools: Fix tracing info recording
    perf hists browser: Elide DSO column when it is set to just one DSO, ditto for threads
    perf hists: Don't consider filtered entries when calculating column widths
    perf hists: Don't decay total_period for filtered entries
    perf hists browser: Honour symbol_conf.show_{nr_samples,total_period}
    perf hists browser: Do not exit on tab key with single event
    perf annotate browser: Don't change selection line when returning from callq
    perf tools: handle endianness of feature bitmap
    perf tools: Add prelink suggestion to dso update message
    perf script: Fix unknown feature comment
    perf hists browser: Apply the dso and thread filters when merging new batches
    perf hists: Move the dso and thread filters from hist_browser
    perf ui browser: Honour the xterm colors
    perf top tui: Give color hints just on the percentage, like on --stdio
    perf ui browser: Make the colors configurable and change the defaults
    perf tui: Remove unneeded call to newtCls on startup
    perf hists: Don't format the percentage on hist_entry__snprintf
    ...

    Fix up conflicts in arch/x86/kernel/kprobes.c manually.

    Ingo's tree did the insane "add volatile to const array", which just
    doesn't make sense ("volatile const"?). But we could remove the const
    *and* make the array volatile to make doubly sure that gcc doesn't
    optimize it away..

    Also fix up kernel/trace/ring_buffer.c non-data-conflicts manually: the
    reader_lock has been turned into a raw lock by the core locking merge,
    and there was a new user of it introduced in this perf core merge. Make
    sure that new use also uses the raw accessor functions.

    Linus Torvalds
     

13 Sep, 2011

1 commit

  • The tracing locks can be taken in atomic context and therefore
    cannot be preempted on -rt - annotate it.

    In mainline this change documents the low level nature of
    the lock - otherwise there's no functional difference. Lockdep
    and Sparse checking will work as usual.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

31 Aug, 2011

1 commit

  • The stats file under per_cpu folder provides the number of entries,
    overruns and other statistics about the CPU ring buffer. However, the
    numbers do not provide any indication of how full the ring buffer is in
    bytes compared to the overall size in bytes. Also, it is helpful to know
    the rate at which the cpu buffer is filling up.

    This patch adds an entry "bytes: " in printed stats for per_cpu ring
    buffer which provides the actual bytes consumed in the ring buffer. This
    field includes the number of bytes used by recorded events and the
    padding bytes added when moving the tail pointer to next page.

    It also adds the following time stamps:
    "oldest event ts:" - the oldest timestamp in the ring buffer
    "now ts:" - the timestamp at the time of reading

    The field "now ts" provides a consistent time snapshot to the userspace
    when being read. This is read from the same trace clock used by tracing
    event timestamps.

    Together, these values provide the rate at which the buffer is filling
    up, from the formula:
    bytes / (now_ts - oldest_event_ts)

    Signed-off-by: Vaibhav Nagarnaik
    Cc: Michael Rubin
    Cc: David Sharp
    Link: http://lkml.kernel.org/r/1313531179-9323-3-git-send-email-vnagarnaik@google.com
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     

15 Jun, 2011

3 commits

  • The tracing ring buffer is allocated from kernel memory. While
    allocating a large chunk of memory, OOM might happen which destabilizes
    the system. Thus random processes might get killed during the
    allocation.

    This patch adds __GFP_NORETRY flag to the ring buffer allocation calls
    to make it fail more gracefully if the system will not be able to
    complete the allocation request.

    Acked-by: David Rientjes
    Signed-off-by: Vaibhav Nagarnaik
    Cc: Ingo Molnar
    Cc: Frederic Weisbecker
    Cc: Michael Rubin
    Cc: David Sharp
    Link: http://lkml.kernel.org/r/1307491302-9236-1-git-send-email-vnagarnaik@google.com
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     
  • This patch replaces the code for getting an unsigned long from a
    userspace buffer by a simple call to kstroul_from_user.
    This makes it easier to read and less error prone.

    Signed-off-by: Peter Huewe
    Link: http://lkml.kernel.org/r/1307476707-14762-1-git-send-email-peterhuewe@gmx.de
    Signed-off-by: Steven Rostedt

    Peter Huewe
     
  • The tracing ring buffer is a group of per-cpu ring buffers where
    allocation and logging is done on a per-cpu basis. The events that are
    generated on a particular CPU are logged in the corresponding buffer.
    This is to provide wait-free writes between CPUs and good NUMA node
    locality while accessing the ring buffer.

    However, the allocation routines consider NUMA locality only for buffer
    page metadata and not for the actual buffer page. This causes the pages
    to be allocated on the NUMA node local to the CPU where the allocation
    routine is running at the time.

    This patch fixes the problem by using a NUMA node specific allocation
    routine so that the pages are allocated from a NUMA node local to the
    logging CPU.

    I tested with the getuid_microbench from autotest. It is a simple binary
    that calls getuid() in a loop and measures the average time for the
    syscall to complete. The following command was used to test:
    $ getuid_microbench 1000000

    Compared the numbers found on kernel with and without this patch and
    found that logging latency decreases by 30-50 ns/call.
    tracing with non-NUMA allocation - 569 ns/call
    tracing with NUMA allocation - 512 ns/call

    Signed-off-by: Vaibhav Nagarnaik
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Michael Rubin
    Cc: David Sharp
    Link: http://lkml.kernel.org/r/1304470602-20366-1-git-send-email-vnagarnaik@google.com
    Signed-off-by: Steven Rostedt

    Vaibhav Nagarnaik
     

26 May, 2011

1 commit

  • Witold reported a reboot caused by the selftests of the dynamic function
    tracer. He sent me a config and I used ktest to do a config_bisect on it
    (as my config did not cause the crash). It pointed out that the problem
    config was CONFIG_PROVE_RCU.

    What happened was that if multiple callbacks are attached to the
    function tracer, we iterate a list of callbacks. Because the list is
    managed by synchronize_sched() and preempt_disable, the access to the
    pointers uses rcu_dereference_raw().

    When PROVE_RCU is enabled, the rcu_dereference_raw() calls some
    debugging functions, which happen to be traced. The tracing of the debug
    function would then call rcu_dereference_raw() which would then call the
    debug function and then... well you get the idea.

    I first wrote two different patches to solve this bug.

    1) add a __rcu_dereference_raw() that would not do any checks.
    2) add notrace to the offending debug functions.

    Both of these patches worked.

    Talking with Paul McKenney on IRC, he suggested to add recursion
    detection instead. This seemed to be a better solution, so I decided to
    implement it. As the task_struct already has a trace_recursion to detect
    recursion in the ring buffer, and that has a very small number it
    allows, I decided to use that same variable to add flags that can detect
    the recursion inside the infrastructure of the function tracer.

    I plan to change it so that the task struct bit can be checked in
    mcount, but as that requires changes to all archs, I will hold that off
    to the next merge window.

    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Frederic Weisbecker
    Cc: Paul E. McKenney
    Link: http://lkml.kernel.org/r/1306348063.1465.116.camel@gandalf.stny.rr.com
    Reported-by: Witold Baryluk
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

31 Mar, 2011

1 commit


19 Mar, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (47 commits)
    doc: CONFIG_UNEVICTABLE_LRU doesn't exist anymore
    Update cpuset info & webiste for cgroups
    dcdbas: force SMI to happen when expected
    arch/arm/Kconfig: remove one to many l's in the word.
    asm-generic/user.h: Fix spelling in comment
    drm: fix printk typo 'sracth'
    Remove one to many n's in a word
    Documentation/filesystems/romfs.txt: fixing link to genromfs
    drivers:scsi Change printk typo initate -> initiate
    serial, pch uart: Remove duplicate inclusion of linux/pci.h header
    fs/eventpoll.c: fix spelling
    mm: Fix out-of-date comments which refers non-existent functions
    drm: Fix printk typo 'failled'
    coh901318.c: Change initate to initiate.
    mbox-db5500.c Change initate to initiate.
    edac: correct i82975x error-info reported
    edac: correct i82975x mci initialisation
    edac: correct commented info
    fs: update comments to point correct document
    target: remove duplicate include of target/target_core_device.h from drivers/target/target_core_hba.c
    ...

    Trivial conflict in fs/eventpoll.c (spelling vs addition)

    Linus Torvalds
     

10 Mar, 2011

3 commits

  • The "Delta way too big" warning might appear on a system with a
    unstable shed clock right after the system is resumed and tracing
    was enabled at time of suspend.

    Since it's not realy a bug, and the unstable sched clock is working
    fast and reliable otherwise, Steven suggested to keep using the
    sched clock in any case and just to make note in the warning itself.

    v2 changes:
    - added #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK

    Signed-off-by: Jiri Olsa
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Jiri Olsa
     
  • Signed-off-by: David Sharp
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    David Sharp
     
  • Add an "overwrite" trace_option for ftrace to control whether the buffer should
    be overwritten on overflow or not. The default remains to overwrite old events
    when the buffer is full. This patch adds the option to instead discard newest
    events when the buffer is full. This is useful to get a snapshot of traces just
    after enabling traces. Dropping the current event is also a simpler code path.

    Signed-off-by: David Sharp
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    David Sharp
     

18 Feb, 2011

1 commit


09 Feb, 2011

1 commit

  • The warning "Delta way too big" warning might appear on a system with
    unstable shed clock right after the system is resumed and tracing
    was enabled during the suspend.

    Since it's not realy bug, and the unstable sched clock is working
    fast and reliable otherwise, Steven suggested to keep using the
    sched clock in any case and just to make note in the warning itself.

    Signed-off-by: Jiri Olsa
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Jiri Olsa
     

19 Jan, 2011

1 commit

  • Fix a bunch of
    warning: ‘inline’ is not at beginning of declaration
    messages when building a 'make allyesconfig' kernel with -Wextra.

    These warnings are trivial to kill, yet rather annoying when building with
    -Wextra.
    The more we can cut down on pointless crap like this the better (IMHO).

    A previous patch to do this for a 'allnoconfig' build has already been
    merged. This just takes the cleanup a little further.

    Signed-off-by: Jesper Juhl
    Signed-off-by: Jiri Kosina

    Jesper Juhl
     

24 Dec, 2010

1 commit

  • Fix two related problems in the event-copying loop of
    ring_buffer_read_page.

    The loop condition for copying events is off-by-one.
    "len" is the remaining space in the caller-supplied page.
    "size" is the size of the next event (or two events).
    If len == size, then there is just enough space for the next event.

    size was set to rb_event_ts_length, which may include the size of two
    events if the first event is a time-extend, in order to assure time-
    extends are kept together with the event after it. However,
    rb_advance_reader always advances by one event. This would result in the
    event after any time-extend being duplicated. Instead, get the size of
    a single event for the memcpy, but use rb_event_ts_length for the loop
    condition.

    Signed-off-by: David Sharp
    LKML-Reference:
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    David Sharp
     

28 Oct, 2010

1 commit

  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
    perf python scripting: Add futex-contention script
    perf python scripting: Fixup cut'n'paste error in sctop script
    perf scripting: Shut up 'perf record' final status
    perf record: Remove newline character from perror() argument
    perf python scripting: Support fedora 11 (audit 1.7.17)
    perf python scripting: Improve the syscalls-by-pid script
    perf python scripting: print the syscall name on sctop
    perf python scripting: Improve the syscalls-counts script
    perf python scripting: Improve the failed-syscalls-by-pid script
    kprobes: Remove redundant text_mutex lock in optimize
    x86/oprofile: Fix uninitialized variable use in debug printk
    tracing: Fix 'faild' -> 'failed' typo
    perf probe: Fix format specified for Dwarf_Off parameter
    perf trace: Fix detection of script extension
    perf trace: Use $PERF_EXEC_PATH in canned report scripts
    perf tools: Document event modifiers
    perf tools: Remove direct slang.h include
    perf_events: Fix for transaction recovery in group_sched_in()
    perf_events: Revert: Fix transaction recovery in group_sched_in()
    perf, x86: Use NUMA aware allocations for PEBS/BTS/DS allocations
    ...

    Linus Torvalds
     

26 Oct, 2010

1 commit


23 Oct, 2010

1 commit

  • * 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
    vfs: make no_llseek the default
    vfs: don't use BKL in default_llseek
    llseek: automatically add .llseek fop
    libfs: use generic_file_llseek for simple_attr
    mac80211: disallow seeks in minstrel debug code
    lirc: make chardev nonseekable
    viotape: use noop_llseek
    raw: use explicit llseek file operations
    ibmasmfs: use generic_file_llseek
    spufs: use llseek in all file operations
    arm/omap: use generic_file_llseek in iommu_debug
    lkdtm: use generic_file_llseek in debugfs
    net/wireless: use generic_file_llseek in debugfs
    drm: use noop_llseek

    Linus Torvalds
     

22 Oct, 2010

1 commit

  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (163 commits)
    tracing: Fix compile issue for trace_sched_wakeup.c
    [S390] hardirq: remove pointless header file includes
    [IA64] Move local_softirq_pending() definition
    perf, powerpc: Fix power_pmu_event_init to not use event->ctx
    ftrace: Remove recursion between recordmcount and scripts/mod/empty
    jump_label: Add COND_STMT(), reducer wrappery
    perf: Optimize sw events
    perf: Use jump_labels to optimize the scheduler hooks
    jump_label: Add atomic_t interface
    jump_label: Use more consistent naming
    perf, hw_breakpoint: Fix crash in hw_breakpoint creation
    perf: Find task before event alloc
    perf: Fix task refcount bugs
    perf: Fix group moving
    irq_work: Add generic hardirq context callbacks
    perf_events: Fix transaction recovery in group_sched_in()
    perf_events: Fix bogus AMD64 generic TLB events
    perf_events: Fix bogus context time tracking
    tracing: Remove parent recording in latency tracer graph options
    tracing: Use one prologue for the preempt irqs off tracer function tracers
    ...

    Linus Torvalds
     

21 Oct, 2010

5 commits

  • With the binding of time extends to events we no longer need to use
    the macro RB_TIMESTAMPS_PER_PAGE. Remove it.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • By using inline and noinline, we are able to make the fast path of
    recording an event 4% faster.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • There's a condition to check if we should add a time extend or
    not in the fast path. But this condition is racey (in the sense
    that we can add a unnecessary time extend, but nothing that
    can break anything). We later check if the time or event time
    delta should be zero or have real data in it (not racey), making
    this first check redundant.

    This check may help save space once in a while, but really is
    not worth the hassle to try to save some space that happens at
    most 134 ms at a time.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • When the time between two timestamps is greater than
    2^27 nanosecs (~134 ms) a time extend event is added that extends
    the time difference to 59 bits (~18 years). This is due to
    events only having a 27 bit field to store time.

    Currently this time extend is a separate event. We add it just before
    the event data that is being written to the buffer. But before
    the event data is committed, the event data can also be discarded (as
    with the case of filters). But because the time extend has already been
    committed, it will stay in the buffer.

    If lots of events are being filtered and no event is being
    written, then every 134ms a time extend can be added to the buffer
    without any data attached. To keep from filling the entire buffer
    with time extends, a time extend will never be the first event
    in a page because the page timestamp can be used. Time extends can
    only fill the rest of a page with some data at the beginning.

    This patch binds the time extend with the data. The difference here
    is that the time extend is not committed before the data is added.
    Instead, when a time extend is needed, the space reserved on
    the ring buffer is the time extend + the data event size. The
    time extend is added to the first part of the reserved block and
    the data is added to the second. The time extend event is passed
    back to the reserver, but since the reserver also uses a function
    to find the data portion of the reserved block, no changes to the
    ring buffer interface need to be made.

    When a commit is discarded, we now remove both the time extend and
    the event. With this approach no more than one time extend can
    be in the buffer in a row. Data must always follow a time extend.

    Thanks to Mathieu Desnoyers for suggesting this idea.

    Suggested-by: Mathieu Desnoyers
    Cc: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The delta between events is passed to the timestamp code by reference
    and the timestamp code will reset the value. But it can be reset
    from the caller. No need to pass it in by reference.

    By changing the call to pass by value, lets gcc optimize the code
    a bit more where it can store the delta in a register and not
    worry about updating the reference.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

20 Oct, 2010

1 commit

  • The original code for the ring buffer had locations that modified
    the timestamp and that change was used by the callers. Now,
    the timestamp is not reused by the callers and there is no reason
    to pass it by reference.

    By changing the call to pass by value, lets gcc optimize the code
    a bit more where it can store the timestamp in a register and not
    worry about updating the reference.

    Signed-off-by: Steven Rostedt

    Steven Rostedt