17 Dec, 2014

1 commit


11 Nov, 2014

2 commits

  • If the read loop in trace_buffers_splice_read() keeps failing due to
    memory allocation failures without reading even a single page then this
    function will keep busy looping.

    Remove the risk for that by exiting the function if memory allocation
    failures are seen.

    Link: http://lkml.kernel.org/r/1415309167-2373-2-git-send-email-rabin@rab.in

    Signed-off-by: Rabin Vincent
    Signed-off-by: Steven Rostedt

    Rabin Vincent
     
  • On a !PREEMPT kernel, attempting to use trace-cmd results in a soft
    lockup:

    # trace-cmd record -e raw_syscalls:* -F false
    NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trace-cmd:61]
    ...
    Call Trace:
    [] ? __wake_up_common+0x90/0x90
    [] wait_on_pipe+0x35/0x40
    [] tracing_buffers_splice_read+0x2e3/0x3c0
    [] ? tracing_stats_read+0x2a0/0x2a0
    [] ? _raw_spin_unlock+0x2b/0x40
    [] ? do_read_fault+0x21b/0x290
    [] ? handle_mm_fault+0x2ba/0xbd0
    [] ? trace_event_buffer_lock_reserve+0x40/0x80
    [] ? trace_buffer_lock_reserve+0x22/0x60
    [] ? trace_event_buffer_lock_reserve+0x40/0x80
    [] do_splice_to+0x6d/0x90
    [] SyS_splice+0x7c1/0x800
    [] tracesys_phase2+0xd3/0xd8

    The problem is this: tracing_buffers_splice_read() calls
    ring_buffer_wait() to wait for data in the ring buffers. The buffers
    are not empty so ring_buffer_wait() returns immediately. But
    tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1,
    meaning it only wants to read a full page. When the full page is not
    available, tracing_buffers_splice_read() tries to wait again with
    ring_buffer_wait(), which again returns immediately, and so on.

    Fix this by adding a "full" argument to ring_buffer_wait() which will
    make ring_buffer_wait() wait until the writer has left the reader's
    page, i.e. until full-page reads will succeed.

    Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in

    Cc: stable@vger.kernel.org # 3.16+
    Fixes: b1169cc69ba9 ("tracing: Remove mock up poll wait function")
    Signed-off-by: Rabin Vincent
    Signed-off-by: Steven Rostedt

    Rabin Vincent
     

31 Oct, 2014

1 commit

  • ARM has some private syscalls (for example, set_tls(2)) which lie
    outside the range of NR_syscalls. If any of these are called while
    syscall tracing is being performed, out-of-bounds array access will
    occur in the ftrace and perf sys_{enter,exit} handlers.

    # trace-cmd record -e raw_syscalls:* true && trace-cmd report
    ...
    true-653 [000] 384.675777: sys_enter: NR 192 (0, 1000, 3, 4000022, ffffffff, 0)
    true-653 [000] 384.675812: sys_exit: NR 192 = 1995915264
    true-653 [000] 384.675971: sys_enter: NR 983045 (76f74480, 76f74000, 76f74b28, 76f74480, 76f76f74, 1)
    true-653 [000] 384.675988: sys_exit: NR 983045 = 0
    ...

    # trace-cmd record -e syscalls:* true
    [ 17.289329] Unable to handle kernel paging request at virtual address aaaaaace
    [ 17.289590] pgd = 9e71c000
    [ 17.289696] [aaaaaace] *pgd=00000000
    [ 17.289985] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
    [ 17.290169] Modules linked in:
    [ 17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21
    [ 17.290585] task: 9f4dab00 ti: 9e710000 task.ti: 9e710000
    [ 17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8
    [ 17.290866] LR is at syscall_trace_enter+0x124/0x184

    Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers.

    Commit cd0980fc8add "tracing: Check invalid syscall nr while tracing syscalls"
    added the check for less than zero, but it should have also checked
    for greater than NR_syscalls.

    Link: http://lkml.kernel.org/p/1414620418-29472-1-git-send-email-rabin@rab.in

    Fixes: cd0980fc8add "tracing: Check invalid syscall nr while tracing syscalls"
    Cc: stable@vger.kernel.org # 2.6.33+
    Signed-off-by: Rabin Vincent
    Signed-off-by: Steven Rostedt

    Rabin Vincent
     

25 Oct, 2014

2 commits

  • When modifying code, ftrace has several checks to make sure things
    are being done correctly. One of them is to make sure any code it
    modifies is exactly what it expects it to be before it modifies it.
    In order to do so with the new trampoline logic, it must be able
    to find out what trampoline a function is hooked to in order to
    see if the code that hooks to it is what's expected.

    The logic to find the trampoline from a record (accounting descriptor
    for a function that is hooked) needs to only look at the "old_hash"
    of an ops that is being modified. The old_hash is the list of function
    an ops is hooked to before its update. Since a record would only be
    pointing to an ops that is being modified if it was already hooked
    before.

    Currently, it can pick a modified ops based on its new functions it
    will be hooked to, and this picks the wrong trampoline and causes
    the check to fail, disabling ftrace.

    Signed-off-by: Steven Rostedt

    ftrace: squash into ordering of ops for modification

    Steven Rostedt (Red Hat)
     
  • The code that checks for trampolines when modifying function hooks
    tests against a modified ops "old_hash". But the ops old_hash pointer
    is not being updated before the changes are made, making it possible
    to not find the right hash to the callback and possibly causing
    ftrace to break in accounting and disable itself.

    Have the ops set its old_hash before the modifying takes place.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

13 Oct, 2014

1 commit

  • Pull scheduler updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
    Hansen)

    - Various sched/idle refinements for better idle handling (Nicolas
    Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)

    - sched/numa updates and optimizations (Rik van Riel)

    - sysbench speedup (Vincent Guittot)

    - capacity calculation cleanups/refactoring (Vincent Guittot)

    - Various cleanups to thread group iteration (Oleg Nesterov)

    - Double-rq-lock removal optimization and various refactorings
    (Kirill Tkhai)

    - various sched/deadline fixes

    ... and lots of other changes"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
    sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
    sched/fair: Delete resched_cpu() from idle_balance()
    sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
    sched: Improve sysbench performance by fixing spurious active migration
    sched/x86: Fix up typo in topology detection
    x86, sched: Add new topology for multi-NUMA-node CPUs
    sched/rt: Use resched_curr() in task_tick_rt()
    sched: Use rq->rd in sched_setaffinity() under RCU read lock
    sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
    sched: Use dl_bw_of() under RCU read lock
    sched/fair: Remove duplicate code from can_migrate_task()
    sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
    sched: print_rq(): Don't use tasklist_lock
    sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
    sched: Fix the task-group check in tg_has_rt_tasks()
    sched/fair: Leverage the idle state info when choosing the "idlest" cpu
    sched: Let the scheduler see CPU idle states
    sched/deadline: Fix inter- exclusive cpusets migrations
    sched/deadline: Clear dl_entity params when setscheduling to different class
    sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
    ...

    Linus Torvalds
     

12 Oct, 2014

2 commits

  • Pull tracing fixes from Steven Rostedt:
    "Seems that Peter Zijlstra added a new check that is making old code
    scream nasty warnings:

    WARNING: CPU: 0 PID: 91 at kernel/sched/core.c:7253 __might_sleep+0x9a/0x378()
    do not call blocking ops when !TASK_RUNNING; state=1 set at [] event_test_thread+0x48/0x93
    Call Trace:
    __might_sleep+0x9a/0x378
    down_read+0x26/0x98
    exit_signals+0x27/0x1c2
    do_exit+0x193/0x10bd
    kthread+0x156/0x156
    ret_from_fork+0x7a/0xb0

    These are triggered by some self tests that run at start up when
    configure in. Although the code is technically correct, they are a
    little sloppy and not very robust. They work now because it runs at
    boot up and the tests do not call anything that might trigger a
    spurious wake up. But that doesn't mean those tests wont change in
    the future.

    It's best to clean them now to make sure the tests used to test the
    internal workings of the system don't cause breakage themselves.

    This also quiets the warnings made by the new checks"

    * tag 'trace-3.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Clean up scheduling in trace_wakeup_test_thread()
    tracing: Robustify wait loop

    Linus Torvalds
     
  • Pull tracing updates from Steven Rostedt:
    "This set has a few minor updates, but the big change is the redesign
    of the trampoline logic.

    The trampoline logic of 3.17 required a descriptor for every function
    that is registered to be traced and uses a trampoline. Currently,
    only the function graph tracer uses a trampoline, but if you were to
    trace all 32,000 (give or take a few thousand) functions with the
    function graph tracer, it would create 32,000 descriptors to let us
    know that there's a trampoline associated with it. This takes up a
    bit of memory when there's a better way to do it.

    The redesign now reuses the ftrace_ops' (what registers the function
    graph tracer) hash tables. The hash tables tell ftrace what the
    tracer wants to trace or doesn't want to trace. There's two of them:
    one that tells us what to trace, the other tells us what not to trace.
    If the first one is empty, it means all functions should be traced,
    otherwise only the ones that are listed should be. The second hash
    table tells us what not to trace, and if it is empty, all functions
    may be traced, and if there's any listed, then those should not be
    traced even if they exist in the first hash table.

    It took a bit of massaging, but now these hashes can be used to keep
    track of what has a trampoline and what does not, and allows the
    ftrace accounting to work. Now we can trace all functions when using
    the function graph trampoline, and avoid needing to create any special
    descriptors to hold all the functions that are being traced"

    * tag 'trace-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    ftrace: Only disable ftrace_enabled to test buffer in selftest
    ftrace: Add sanity check when unregistering last ftrace_ops
    kernel: trace_syscalls: Replace rcu_assign_pointer() with RCU_INIT_POINTER()
    tracing: generate RCU warnings even when tracepoints are disabled
    ftrace: Replace tramp_hash with old_*_hash to save space
    ftrace: Annotate the ops operation on update
    ftrace: Grab any ops for a rec for enabled_functions output
    ftrace: Remove freeing of old_hash from ftrace_hash_move()
    ftrace: Set callback to ftrace_stub when no ops are registered
    ftrace: Add helper function ftrace_ops_get_func()
    ftrace: Add separate function for non recursive callbacks

    Linus Torvalds
     

09 Oct, 2014

2 commits

  • Peter's new debugging tool triggers when tasks exit with !TASK_RUNNING.
    The code in trace_wakeup_test_thread() also has a single schedule() call
    that should be encompassed by a loop.

    This cleans up the code a little to make it a bit more robust and
    also makes the return exit properly with TASK_RUNNING.

    Link: http://lkml.kernel.org/p/20141008135216.76142204@gandalf.local.home

    Reported-by: Peter Zijlstra
    Acked-by: Peter Zijlstra
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The pending nested sleep debugging triggered on the potential stale
    TASK_INTERRUPTIBLE in this code.

    While there, fix the loop such that we won't revert to a while(1)
    yield() 'spin' loop if we ever get a spurious wakeup.

    And fix the actual issue by properly terminating the 'wait' loop by
    setting TASK_RUNNING.

    Link: http://lkml.kernel.org/p/20141008165110.GA14547@worktop.programming.kicks-ass.net

    Reported-by: Fengguang Wu
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Steven Rostedt

    Peter Zijlstra
     

03 Oct, 2014

1 commit

  • Commit 651e22f2701b "ring-buffer: Always reset iterator to reader page"
    fixed one bug but in the process caused another one. The reset is to
    update the header page, but that fix also changed the way the cached
    reads were updated. The cache reads are used to test if an iterator
    needs to be updated or not.

    A ring buffer iterator, when created, disables writes to the ring buffer
    but does not stop other readers or consuming reads from happening.
    Although all readers are synchronized via a lock, they are only
    synchronized when in the ring buffer functions. Those functions may
    be called by any number of readers. The iterator continues down when
    its not interrupted by a consuming reader. If a consuming read
    occurs, the iterator starts from the beginning of the buffer.

    The way the iterator sees that a consuming read has happened since
    its last read is by checking the reader "cache". The cache holds the
    last counts of the read and the reader page itself.

    Commit 651e22f2701b changed what was saved by the cache_read when
    the rb_iter_reset() occurred, making the iterator never match the cache.
    Then if the iterator calls rb_iter_reset(), it will go into an
    infinite loop by checking if the cache doesn't match, doing the reset
    and retrying, just to see that the cache still doesn't match! Which
    should never happen as the reset is suppose to set the cache to the
    current value and there's locks that keep a consuming reader from
    having access to the data.

    Fixes: 651e22f2701b "ring-buffer: Always reset iterator to reader page"
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

19 Sep, 2014

3 commits

  • This facility is used in a few places so let's introduce
    a helper function to improve code readability.

    Signed-off-by: Aaron Tomlin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: aneesh.kumar@linux.vnet.ibm.com
    Cc: dzickus@redhat.com
    Cc: bmr@redhat.com
    Cc: jcastillo@redhat.com
    Cc: oleg@redhat.com
    Cc: riel@redhat.com
    Cc: prarit@redhat.com
    Cc: jgh@redhat.com
    Cc: minchan@kernel.org
    Cc: mpe@ellerman.id.au
    Cc: tglx@linutronix.de
    Cc: hannes@cmpxchg.org
    Cc: Andrew Morton
    Cc: Benjamin Herrenschmidt
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Seiji Aguchi
    Cc: Steven Rostedt
    Cc: Yasuaki Ishimatsu
    Cc: linuxppc-dev@lists.ozlabs.org
    Link: http://lkml.kernel.org/r/1410527779-8133-3-git-send-email-atomlin@redhat.com
    Signed-off-by: Ingo Molnar

    Aaron Tomlin
     
  • Tasks get their end of stack set to STACK_END_MAGIC with the
    aim to catch stack overruns. Currently this feature does not
    apply to init_task. This patch removes this restriction.

    Note that a similar patch was posted by Prarit Bhargava
    some time ago but was never merged:

    http://marc.info/?l=linux-kernel&m=127144305403241&w=2

    Signed-off-by: Aaron Tomlin
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Oleg Nesterov
    Acked-by: Michael Ellerman
    Cc: aneesh.kumar@linux.vnet.ibm.com
    Cc: dzickus@redhat.com
    Cc: bmr@redhat.com
    Cc: jcastillo@redhat.com
    Cc: jgh@redhat.com
    Cc: minchan@kernel.org
    Cc: tglx@linutronix.de
    Cc: hannes@cmpxchg.org
    Cc: Alex Thorlton
    Cc: Andrew Morton
    Cc: Benjamin Herrenschmidt
    Cc: Daeseok Youn
    Cc: David Rientjes
    Cc: Fabian Frederick
    Cc: Geert Uytterhoeven
    Cc: Jiri Olsa
    Cc: Kees Cook
    Cc: Kirill A. Shutemov
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Michael Opdenacker
    Cc: Paul Mackerras
    Cc: Prarit Bhargava
    Cc: Rik van Riel
    Cc: Rusty Russell
    Cc: Seiji Aguchi
    Cc: Steven Rostedt
    Cc: Vladimir Davydov
    Cc: Yasuaki Ishimatsu
    Cc: linuxppc-dev@lists.ozlabs.org
    Link: http://lkml.kernel.org/r/1410527779-8133-2-git-send-email-atomlin@redhat.com
    Signed-off-by: Ingo Molnar

    Aaron Tomlin
     
  • schedule(), io_schedule() and schedule_timeout() always return
    with TASK_RUNNING state set, so one more setting is unnecessary.

    (All places in patch are visible good, only exception is
    kiblnd_scheduler() from:

    drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c

    Its schedule() is one line above standard 3 lines of unified diff)

    No places where set_current_state() is used for mb().

    Signed-off-by: Kirill Tkhai
    Signed-off-by: Peter Zijlstra (Intel)
    Link: http://lkml.kernel.org/r/1410529254.3569.23.camel@tkhai
    Cc: Alasdair Kergon
    Cc: Anil Belur
    Cc: Arnd Bergmann
    Cc: Dave Kleikamp
    Cc: David Airlie
    Cc: David Howells
    Cc: Dmitry Eremin
    Cc: Frank Blaschka
    Cc: Greg Kroah-Hartman
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Isaac Huang
    Cc: James E.J. Bottomley
    Cc: James E.J. Bottomley
    Cc: J. Bruce Fields
    Cc: Jeff Dike
    Cc: Jesper Nilsson
    Cc: Jiri Slaby
    Cc: Laura Abbott
    Cc: Liang Zhen
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Masaru Nomura
    Cc: Michael Opdenacker
    Cc: Mikael Starvik
    Cc: Mike Snitzer
    Cc: Neil Brown
    Cc: Oleg Drokin
    Cc: Peng Tao
    Cc: Richard Weinberger
    Cc: Robert Love
    Cc: Steven Rostedt
    Cc: Trond Myklebust
    Cc: Ursula Braun
    Cc: Zi Shen Lim
    Cc: devel@driverdev.osuosl.org
    Cc: dm-devel@redhat.com
    Cc: dri-devel@lists.freedesktop.org
    Cc: fcoe-devel@open-fcoe.org
    Cc: jfs-discussion@lists.sourceforge.net
    Cc: linux390@de.ibm.com
    Cc: linux-afs@lists.infradead.org
    Cc: linux-cris-kernel@axis.com
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-nfs@vger.kernel.org
    Cc: linux-parisc@vger.kernel.org
    Cc: linux-raid@vger.kernel.org
    Cc: linux-s390@vger.kernel.org
    Cc: linux-scsi@vger.kernel.org
    Cc: qla2xxx-upstream@qlogic.com
    Cc: user-mode-linux-devel@lists.sourceforge.net
    Cc: user-mode-linux-user@lists.sourceforge.net
    Signed-off-by: Ingo Molnar

    Kirill Tkhai
     

13 Sep, 2014

2 commits


10 Sep, 2014

7 commits

  • The uses of "rcu_assign_pointer()" are NULLing out the pointers.
    According to RCU_INIT_POINTER()'s block comment:
    "1. This use of RCU_INIT_POINTER() is NULLing out the pointer"
    it is better to use it instead of rcu_assign_pointer() because it has a
    smaller overhead.

    The following Coccinelle semantic patch was used:
    @@
    @@

    - rcu_assign_pointer
    + RCU_INIT_POINTER
    (..., NULL)

    Link: http://lkml.kernel.org/p/20140822142822.GA32391@ada

    Signed-off-by: Andreea-Cristina Bernat
    Signed-off-by: Steven Rostedt

    Andreea-Cristina Bernat
     
  • Allowing function callbacks to declare their own trampolines requires
    that each ftrace_ops that has a trampoline must have some sort of
    accounting that keeps track of which ops has a trampoline attached
    to a record.

    The easy way to solve this was to add a "tramp_hash" that created a
    hash entry for every function that a ops uses with a trampoline.
    But since we can have literally tens of thousands of functions being
    traced, that means we need tens of thousands of descriptors to map
    the ops to the function in the hash. This is quite expensive and
    can cause enabling and disabling the function graph tracer to take
    some time to start and stop. It can take up to several seconds to
    disable or enable all functions in the function graph tracer for this
    reason.

    The better approach albeit more complex, is to keep track of how ops
    are being enabled and disabled, and use that along with the counting
    of the number of ops attached to records, to determive what ops has
    a trampoline attached to a record at enabling and disabling of
    tracing.

    To do this, the tramp_hash has been replaced with an old_filter_hash
    and old_notrace_hash, which get the copy of the ops filter_hash and
    notrace_hash respectively. The old hashes is kept until the ops has
    been modified or removed and the old hashes are used with the logic
    of the accounting to determine the ops that have the trampoline of
    a record. The reason this has less of a footprint is due to the trick
    that an "empty" hash in the filter_hash means "all functions" and
    an empty hash in the notrace hash means "no functions" in the hash.

    This is much more efficienct, doesn't have the delay, and takes up
    much less memory, as we do not need to map all the functions but
    just figure out which functions are mapped at the time it is
    enabled or disabled.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Add three new flags for ftrace_ops:

    FTRACE_OPS_FL_ADDING
    FTRACE_OPS_FL_REMOVING
    FTRACE_OPS_FL_MODIFYING

    These will be set for the ftrace_ops when they are first added
    to the function tracing, being removed from function tracing
    or just having their functions changed from function tracing,
    respectively.

    This will be needed to remove the tramp_hash, which can grow quite
    big. The tramp_hash is used to note what functions a ftrace_ops
    is using a trampoline for. Denoting which ftrace_ops is being
    modified, will allow us to use the ftrace_ops hashes themselves,
    which are much smaller as they have a global flag to denote if
    a ftrace_ops is tracing all functions, as well as a notrace hash
    if the ftrace_ops is tracing all but a few. The tramp_hash just
    creates a hash item for every function, which can go into the 10s
    of thousands if all functions are using the ftrace_ops trampoline.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • When dumping the enabled_functions, use the first op that is
    found with a trampoline to the record, as there should only be
    one, as only one ops can be registered to a function that has
    a trampoline.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • ftrace_hash_move() currently frees the old hash that is passed to it
    after replacing the pointer with the new hash. Instead of having the
    function do that chore, have the caller perform the free.

    This lets the ftrace_hash_move() be used a bit more freely, which
    is needed for changing the way the trampoline logic is done.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • The clean up that adds the helper function ftrace_ops_get_func()
    caused the default function to not change when DYNAMIC_FTRACE was not
    set and no ftrace_ops were registered. Although static tracing is
    not very useful (not having DYNAMIC_FTRACE set), it is still supported
    and we don't want to break it.

    Clean up the if statement even more to specifically have the default
    function call ftrace_stub when no ftrace_ops are registered. This
    fixes the small bug for static tracing as well as makes the code a
    bit more understandable.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Add the helper function to what the mcount trampoline is to call
    for a ftrace_ops function. This helper will be used by arch code
    in the future to set up dynamic trampolines. But as this does the
    same tests that are performed in choosing what function to call for
    the default mcount trampoline, might as well use it to clean up
    the existing code.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

09 Sep, 2014

1 commit

  • Instead of using the generic list function for callbacks that
    are not recursive, call a new helper function from the mcount
    trampoline called ftrace_ops_recur_func() that will do the recursion
    checking for the callback.

    This eliminates an indirection as well as will help in future code
    that will use dynamically allocated trampolines.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

26 Aug, 2014

1 commit

  • Epoll on trace_pipe can sometimes hang in a weird case. If the ring buffer is
    empty when we set waiters_pending but an event shows up exactly at that moment
    we can miss being woken up by the ring buffers irq work. Since
    ring_buffer_empty() is inherently racey we will sometimes think that the buffer
    is not empty. So we don't get woken up and we don't think there are any events
    even though there were some ready when we added the watch, which makes us hang.
    This patch fixes this by making sure that we are actually on the wait list
    before we set waiters_pending, and add a memory barrier to make sure
    ring_buffer_empty() is going to be correct.

    Link: http://lkml.kernel.org/p/1408989581-23727-1-git-send-email-jbacik@fb.com

    Cc: stable@vger.kernel.org # 3.10+
    Cc: Martin Lau
    Signed-off-by: Josef Bacik
    Signed-off-by: Steven Rostedt

    Josef Bacik
     

23 Aug, 2014

5 commits

  • In __ftrace_replace_code(), when converting the call to a nop in a function
    it needs to compare against the "curr" (current) value of the ftrace ops, and
    not the "new" one. It currently does not affect x86 which is the only arch
    to do the trampolines with function graph tracer, but when other archs that do
    depend on this code implement the function graph trampoline, it can crash.

    Here's an example when ARM uses the trampolines (in the future):

    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 9 at kernel/trace/ftrace.c:1716 ftrace_bug+0x17c/0x1f4()
    Modules linked in: omap_rng rng_core ipv6
    CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.16.0-test-10959-gf0094b28f303-dirty #52
    [] (unwind_backtrace) from [] (show_stack+0x20/0x24)
    [] (show_stack) from [] (dump_stack+0x78/0x94)
    [] (dump_stack) from [] (warn_slowpath_common+0x7c/0x9c)
    [] (warn_slowpath_common) from [] (warn_slowpath_null+0x2c/0x34)
    [] (warn_slowpath_null) from [] (ftrace_bug+0x17c/0x1f4)
    [] (ftrace_bug) from [] (ftrace_replace_code+0x80/0x9c)
    [] (ftrace_replace_code) from [] (ftrace_modify_all_code+0xb8/0x164)
    [] (ftrace_modify_all_code) from [] (__ftrace_modify_code+0x14/0x1c)
    [] (__ftrace_modify_code) from [] (multi_cpu_stop+0xf4/0x134)
    [] (multi_cpu_stop) from [] (cpu_stopper_thread+0x54/0x130)
    [] (cpu_stopper_thread) from [] (smpboot_thread_fn+0x1ac/0x1bc)
    [] (smpboot_thread_fn) from [] (kthread+0xe0/0xfc)
    [] (kthread) from [] (ret_from_fork+0x14/0x20)
    ---[ end trace dc9ce72c5b617d8f ]---
    [ 65.047264] ftrace failed to modify [] asm_do_IRQ+0x10/0x1c
    [ 65.054070] actual: 85:1b:00:eb

    Fixes: 7413af1fb70e7 "ftrace: Make get_ftrace_addr() and get_ftrace_addr_old() global"
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • The latest rewrite of ftrace removed the separate ftrace_ops of
    the function tracer and the function graph tracer and had them
    share the same ftrace_ops. This simplified the accounting by removing
    the multiple layers of functions called, where the global_ops func
    would call a special list that would iterate over the other ops that
    were registered within it (like function and function graph), which
    itself was registered to the ftrace ops list of all functions
    currently active. If that sounds confusing, the code that implemented
    it was also confusing and its removal is a good thing.

    The problem with this change was that it assumed that the function
    and function graph tracer can never be used at the same time.
    This is mostly true, but there is an exception. That is when the
    function profiler uses the function graph tracer to profile.
    The function profiler can be activated the same time as the function
    tracer, and this breaks the assumption and the result is that ftrace
    will crash (it detects the error and shuts itself down, it does not
    cause a kernel oops).

    To solve this issue, a previous change allowed the hash tables
    for the functions traced by a ftrace_ops to be a pointer and let
    multiple ftrace_ops share the same hash. This allows the function
    and function_graph tracer to have separate ftrace_ops, but still
    share the hash, which is what is done.

    Now the function and function graph tracers have separate ftrace_ops
    again, and the function tracer can be run while the function_profile
    is active.

    Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Now that a ftrace_hash can be shared by multiple ftrace_ops, they can dec
    the rec->flags by more than once (one per those that share the ftrace_hash).
    This means that the tramp_hash may not have a hash item when it was added.

    For example, if two ftrace_ops share a hash for a ftrace record, and the
    first ops has a trampoline, when it adds itself it will set the rec->flags
    TRAMP flag and increments its nr_trampolines counter. When the second ops
    is added, it must clear that tramp flag but also decrement the other ops
    that shares its hash. As the update to the function callbacks has not yet
    been performed, the other ops will not have the tramp hash set yet and it
    can not be used to know to decrement its nr_trampolines.

    Luckily, the tramp_hash does not need to be used. As the ftrace_mutex is
    held, a ops with a trampoline to a record during an update of another ops
    that shares the record will have its func_hash pointing to it. Since a
    trampoline can only be set for a record if only one ops is attached to it,
    we can just check if the record has a trampoline (the FTRACE_FL_TRAMP flag
    is set) and then find the ops that has this record in its hashes.

    Also added some output to help debug when things go wrong.

    Cc: stable@vger.kernel.org # 3.16+ (apply after 3.17-rc4 is out)
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • When updating what an ftrace_ops traces, if it is registered (that is,
    actively tracing), and that ftrace_ops uses the shared global_ops
    local_hash, then we need to update all tracers that are active and
    also share the global_ops' ftrace_hash_ops.

    Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Currently the top level debug file system function tracer shares its
    ftrace_ops with the function graph tracer. This was thought to be fine
    because the tracers are not used together, as one can only enable
    function or function_graph tracer in the current_tracer file.

    But that assumption proved to be incorrect. The function profiler
    can use the function graph tracer when function tracing is enabled.
    Since all function graph users uses the function tracing ftrace_ops
    this causes a conflict and when a user enables both function profiling
    as well as the function tracer it will crash ftrace and disable it.

    The quick solution so far is to move them as separate ftrace_ops like
    it was earlier. The problem though is to synchronize the functions that
    are traced because both function and function_graph tracer are limited
    by the selections made in the set_ftrace_filter and set_ftrace_notrace
    files.

    To handle this, a new structure is made called ftrace_ops_hash. This
    structure will now hold the filter_hash and notrace_hash, and the
    ftrace_ops will point to this structure. That will allow two ftrace_ops
    to share the same hashes.

    Since most ftrace_ops do not share the hashes, and to keep allocation
    simple, the ftrace_ops structure will include both a pointer to the
    ftrace_ops_hash called func_hash, as well as the structure itself,
    called local_hash. When the ops are registered, the func_hash pointer
    will be initialized to point to the local_hash within the ftrace_ops
    structure. Some of the ftrace internal ftrace_ops will be initialized
    statically. This will allow for the function and function_graph tracer
    to have separate ops but still share the same hash tables that determine
    what functions they trace.

    Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

10 Aug, 2014

1 commit

  • Pull trace file read iterator fixes from Steven Rostedt:
    "This contains a fix for two long standing bugs. Both of which are
    rarely ever hit, and requires the user to do something that users
    rarely do. It took a few special test cases to even trigger this bug,
    and one of them was just one test in the process of finishing up as
    another one started.

    Both bugs have to do with the ring buffer iterator rb_iter_peek(), but
    one is more indirect than the other.

    The fist bug fix is simply an increase in the safety net loop counter.
    The counter makes sure that the rb_iter_peek() only iterates the
    number of times we expect it can, and no more. Well, there was one
    way it could iterate one more than we expected, and that caused the
    ring buffer to shutdown with a nasty warning. The fix was simply to
    up that counter by one.

    The other bug has to be with rb_iter_reset() (called by
    rb_iter_peek()). This happens when a user reads both the trace_pipe
    and trace files. The trace_pipe is a consuming read and does not use
    the ring buffer iterator, but the trace file is not a consuming read
    and does use the ring buffer iterator. When the trace file is being
    read, if it detects that a consuming read occurred, it resets the
    iterator and starts over. But the reset code that does this
    (rb_iter_reset()), checks if the reader_page is linked to the ring
    buffer or not, and will look into the ring buffer itself if it is not.
    This is wrong, as it should always try to read the reader page first.
    Not to mention, the code that looked into the ring buffer did it
    wrong, and used the header_page "read" offset to start reading on that
    page. That offset is bogus for pages in the writable ring buffer, and
    was corrupting the iterator, and it would start returning bogus
    events"

    * tag 'trace-fixes-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    ring-buffer: Always reset iterator to reader page
    ring-buffer: Up rb_iter_peek() loop count to 3

    Linus Torvalds
     

07 Aug, 2014

2 commits

  • When performing a consuming read, the ring buffer swaps out a
    page from the ring buffer with a empty page and this page that
    was swapped out becomes the new reader page. The reader page
    is owned by the reader and since it was swapped out of the ring
    buffer, writers do not have access to it (there's an exception
    to that rule, but it's out of scope for this commit).

    When reading the "trace" file, it is a non consuming read, which
    means that the data in the ring buffer will not be modified.
    When the trace file is opened, a ring buffer iterator is allocated
    and writes to the ring buffer are disabled, such that the iterator
    will not have issues iterating over the data.

    Although the ring buffer disabled writes, it does not disable other
    reads, or even consuming reads. If a consuming read happens, then
    the iterator is reset and starts reading from the beginning again.

    My tests would sometimes trigger this bug on my i386 box:

    WARNING: CPU: 0 PID: 5175 at kernel/trace/trace.c:1527 __trace_find_cmdline+0x66/0xaa()
    Modules linked in:
    CPU: 0 PID: 5175 Comm: grep Not tainted 3.16.0-rc3-test+ #8
    Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
    00000000 00000000 f09c9e1c c18796b3 c1b5d74c f09c9e4c c103a0e3 c1b5154b
    f09c9e78 00001437 c1b5d74c 000005f7 c10bd85a c10bd85a c1cac57c f09c9eb0
    ed0e0000 f09c9e64 c103a185 00000009 f09c9e5c c1b5154b f09c9e78 f09c9e80^M
    Call Trace:
    [] dump_stack+0x4b/0x75
    [] warn_slowpath_common+0x7e/0x95
    [] ? __trace_find_cmdline+0x66/0xaa
    [] ? __trace_find_cmdline+0x66/0xaa
    [] warn_slowpath_fmt+0x33/0x35
    [] __trace_find_cmdline+0x66/0xaa^M
    [] trace_find_cmdline+0x40/0x64
    [] trace_print_context+0x27/0xec
    [] ? trace_seq_printf+0x37/0x5b
    [] print_trace_line+0x319/0x39b
    [] ? ring_buffer_read+0x47/0x50
    [] s_show+0x192/0x1ab
    [] ? s_next+0x5a/0x7c
    [] seq_read+0x267/0x34c
    [] vfs_read+0x8c/0xef
    [] ? seq_lseek+0x154/0x154
    [] SyS_read+0x54/0x7f
    [] syscall_call+0x7/0xb
    ---[ end trace 3f507febd6b4cc83 ]---
    >>>> ##### CPU 1 buffer started ####

    Which was the __trace_find_cmdline() function complaining about the pid
    in the event record being negative.

    After adding more test cases, this would trigger more often. Strangely
    enough, it would never trigger on a single test, but instead would trigger
    only when running all the tests. I believe that was the case because it
    required one of the tests to be shutting down via delayed instances while
    a new test started up.

    After spending several days debugging this, I found that it was caused by
    the iterator becoming corrupted. Debugging further, I found out why
    the iterator became corrupted. It happened with the rb_iter_reset().

    As consuming reads may not read the full reader page, and only part
    of it, there's a "read" field to know where the last read took place.
    The iterator, must also start at the read position. In the rb_iter_reset()
    code, if the reader page was disconnected from the ring buffer, the iterator
    would start at the head page within the ring buffer (where writes still
    happen). But the mistake there was that it still used the "read" field
    to start the iterator on the head page, where it should always start
    at zero because readers never read from within the ring buffer where
    writes occur.

    I originally wrote a patch to have it set the iter->head to 0 instead
    of iter->head_page->read, but then I questioned why it wasn't always
    setting the iter to point to the reader page, as the reader page is
    still valid. The list_empty(reader_page->list) just means that it was
    successful in swapping out. But the reader_page may still have data.

    There was a bug report a long time ago that was not reproducible that
    had something about trace_pipe (consuming read) not matching trace
    (iterator read). This may explain why that happened.

    Anyway, the correct answer to this bug is to always use the reader page
    an not reset the iterator to inside the writable ring buffer.

    Cc: stable@vger.kernel.org # 2.6.28+
    Fixes: d769041f8653 "ring_buffer: implement new locking"
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • After writting a test to try to trigger the bug that caused the
    ring buffer iterator to become corrupted, I hit another bug:

    WARNING: CPU: 1 PID: 5281 at kernel/trace/ring_buffer.c:3766 rb_iter_peek+0x113/0x238()
    Modules linked in: ipt_MASQUERADE sunrpc [...]
    CPU: 1 PID: 5281 Comm: grep Tainted: G W 3.16.0-rc3-test+ #143
    Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
    0000000000000000 ffffffff81809a80 ffffffff81503fb0 0000000000000000
    ffffffff81040ca1 ffff8800796d6010 ffffffff810c138d ffff8800796d6010
    ffff880077438c80 ffff8800796d6010 ffff88007abbe600 0000000000000003
    Call Trace:
    [] ? dump_stack+0x4a/0x75
    [] ? warn_slowpath_common+0x7e/0x97
    [] ? rb_iter_peek+0x113/0x238
    [] ? rb_iter_peek+0x113/0x238
    [] ? ring_buffer_iter_peek+0x2d/0x5c
    [] ? tracing_iter_reset+0x6e/0x96
    [] ? s_start+0xd7/0x17b
    [] ? kmem_cache_alloc_trace+0xda/0xea
    [] ? seq_read+0x148/0x361
    [] ? vfs_read+0x93/0xf1
    [] ? SyS_read+0x60/0x8e
    [] ? tracesys+0xdd/0xe2

    Debugging this bug, which triggers when the rb_iter_peek() loops too
    many times (more than 2 times), I discovered there's a case that can
    cause that function to legitimately loop 3 times!

    rb_iter_peek() is different than rb_buffer_peek() as the rb_buffer_peek()
    only deals with the reader page (it's for consuming reads). The
    rb_iter_peek() is for traversing the buffer without consuming it, and as
    such, it can loop for one more reason. That is, if we hit the end of
    the reader page or any page, it will go to the next page and try again.

    That is, we have this:

    1. iter->head > iter->head_page->page->commit
    (rb_inc_iter() which moves the iter to the next page)
    try again

    2. event = rb_iter_head_event()
    event->type_len == RINGBUF_TYPE_TIME_EXTEND
    rb_advance_iter()
    try again

    3. read the event.

    But we never get to 3, because the count is greater than 2 and we
    cause the WARNING and return NULL.

    Up the counter to 3.

    Cc: stable@vger.kernel.org # 2.6.37+
    Fixes: 69d1b839f7ee "ring-buffer: Bind time extend and data events together"
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

06 Aug, 2014

1 commit

  • Pull timer and time updates from Thomas Gleixner:
    "A rather large update of timers, timekeeping & co

    - Core timekeeping code is year-2038 safe now for 32bit machines.
    Now we just need to fix all in kernel users and the gazillion of
    user space interfaces which rely on timespec/timeval :)

    - Better cache layout for the timekeeping internal data structures.

    - Proper nanosecond based interfaces for in kernel users.

    - Tree wide cleanup of code which wants nanoseconds but does hoops
    and loops to convert back and forth from timespecs. Some of it
    definitely belongs into the ugly code museum.

    - Consolidation of the timekeeping interface zoo.

    - A fast NMI safe accessor to clock monotonic for tracing. This is a
    long standing request to support correlated user/kernel space
    traces. With proper NTP frequency correction it's also suitable
    for correlation of traces accross separate machines.

    - Checkpoint/restart support for timerfd.

    - A few NOHZ[_FULL] improvements in the [hr]timer code.

    - Code move from kernel to kernel/time of all time* related code.

    - New clocksource/event drivers from the ARM universe. I'm really
    impressed that despite an architected timer in the newer chips SoC
    manufacturers insist on inventing new and differently broken SoC
    specific timers.

    [ Ed. "Impressed"? I don't think that word means what you think it means ]

    - Another round of code move from arch to drivers. Looks like most
    of the legacy mess in ARM regarding timers is sorted out except for
    a few obnoxious strongholds.

    - The usual updates and fixlets all over the place"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
    timekeeping: Fixup typo in update_vsyscall_old definition
    clocksource: document some basic timekeeping concepts
    timekeeping: Use cached ntp_tick_length when accumulating error
    timekeeping: Rework frequency adjustments to work better w/ nohz
    timekeeping: Minor fixup for timespec64->timespec assignment
    ftrace: Provide trace clocks monotonic
    timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC
    seqcount: Add raw_write_seqcount_latch()
    seqcount: Provide raw_read_seqcount()
    timekeeping: Use tk_read_base as argument for timekeeping_get_ns()
    timekeeping: Create struct tk_read_base and use it in struct timekeeper
    timekeeping: Restructure the timekeeper some more
    clocksource: Get rid of cycle_last
    clocksource: Move cycle_last validation to core code
    clocksource: Make delta calculation a function
    wireless: ath9k: Get rid of timespec conversions
    drm: vmwgfx: Use nsec based interfaces
    drm: i915: Use nsec based interfaces
    timekeeping: Provide ktime_get_raw()
    hangcheck-timer: Use ktime_get_ns()
    ...

    Linus Torvalds
     

05 Aug, 2014

3 commits

  • Pull perf changes from Ingo Molnar:
    "Kernel side changes:

    - Consolidate the PMU interrupt-disabled code amongst architectures
    (Vince Weaver)

    - misc fixes

    Tooling changes (new features, user visible changes):

    - Add support for pagefault tracing in 'trace', please see multiple
    examples in the changeset messages (Stanislav Fomichev).

    - Add pagefault statistics in 'trace' (Stanislav Fomichev)

    - Add header for columns in 'top' and 'report' TUI browsers (Jiri
    Olsa)

    - Add pagefault statistics in 'trace' (Stanislav Fomichev)

    - Add IO mode into timechart command (Stanislav Fomichev)

    - Fallback to syscalls:* when raw_syscalls:* is not available in the
    perl and python perf scripts. (Daniel Bristot de Oliveira)

    - Add --repeat global option to 'perf bench' to be used in benchmarks
    such as the existing 'futex' one, that was modified to use it
    instead of a local option. (Davidlohr Bueso)

    - Fix fd -> pathname resolution in 'trace', be it using /proc or a
    vfs_getname probe point. (Arnaldo Carvalho de Melo)

    - Add suggestion of how to set perf_event_paranoid sysctl, to help
    non-root users trying tools like 'trace' to get a working
    environment. (Arnaldo Carvalho de Melo)

    - Updates from trace-cmd for traceevent plugin_kvm plus args cleanup
    (Steven Rostedt, Jan Kiszka)

    - Support S/390 in 'perf kvm stat' (Alexander Yarygin)

    Tooling infrastructure changes:

    - Allow reserving a row for header purposes in the hists browser
    (Arnaldo Carvalho de Melo)

    - Various fixes and prep work related to supporting Intel PT (Adrian
    Hunter)

    - Introduce multiple debug variables control (Jiri Olsa)

    - Add callchain and additional sample information for python scripts
    (Joseph Schuchart)

    - More prep work to support Intel PT: (Adrian Hunter)
    - Polishing 'script' BTS output
    - 'inject' can specify --kallsym
    - VDSO is per machine, not a global var
    - Expose data addr lookup functions previously private to 'script'
    - Large mmap fixes in events processing

    - Include standard stringify macros in power pc code (Sukadev
    Bhattiprolu)

    Tooling cleanups:

    - Convert open coded equivalents to asprintf() (Andy Shevchenko)

    - Remove needless reassignments in 'trace' (Arnaldo Carvalho de Melo)

    - Cache the is_exit syscall test in 'trace) (Arnaldo Carvalho de
    Melo)

    - No need to reimplement err() in 'perf bench sched-messaging', drop
    barf(). (Davidlohr Bueso).

    - Remove ev_name argument from perf_evsel__hists_browse, can be
    obtained from the other parameters. (Jiri Olsa)

    Tooling fixes:

    - Fix memory leak in the 'sched-messaging' perf bench test.
    (Davidlohr Bueso)

    - The -o and -n 'perf bench mem' options are mutually exclusive, emit
    error when both are specified. (Davidlohr Bueso)

    - Fix scrollbar refresh row index in the ui browser, problem exposed
    now that headers will be added and will be allowed to be switched
    on/off. (Jiri Olsa)

    - Handle the num array type in python properly (Sebastian Andrzej
    Siewior)

    - Fix wrong condition for allocation failure (Jiri Olsa)

    - Adjust callchain based on DWARF debug info on powerpc (Sukadev
    Bhattiprolu)

    - Fix a risk for doing free on uninitialized pointer in traceevent
    lib (Rickard Strandqvist)

    - Update attr test with PERF_FLAG_FD_CLOEXEC flag (Jiri Olsa)

    - Enable close-on-exec flag on perf file descriptor (Yann Droneaud)

    - Fix build on gcc 4.4.7 (Arnaldo Carvalho de Melo)

    - Event ordering fixes (Jiri Olsa)"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (123 commits)
    Revert "perf tools: Fix jump label always changing during tracing"
    perf tools: Fix perf usage string leftover
    perf: Check permission only for parent tracepoint event
    perf record: Store PERF_RECORD_FINISHED_ROUND only for nonempty rounds
    perf record: Always force PERF_RECORD_FINISHED_ROUND event
    perf inject: Add --kallsyms parameter
    perf tools: Expose 'addr' functions so they can be reused
    perf session: Fix accounting of ordered samples queue
    perf powerpc: Include util/util.h and remove stringify macros
    perf tools: Fix build on gcc 4.4.7
    perf tools: Add thread parameter to vdso__dso_findnew()
    perf tools: Add dso__type()
    perf tools: Separate the VDSO map name from the VDSO dso name
    perf tools: Add vdso__new()
    perf machine: Fix the lifetime of the VDSO temporary file
    perf tools: Group VDSO global variables into a structure
    perf session: Add ability to skip 4GiB or more
    perf session: Add ability to 'skip' a non-piped event stream
    perf tools: Pass machine to vdso__dso_findnew()
    perf tools: Add dso__data_size()
    ...

    Linus Torvalds
     
  • Pull tracing filter cleanups from Steven Rostedt:
    "Oleg Nesterov did several clean ups with the tracing filter code. As
    he found some small bugs that went into 3.16, and these changes were
    based on that, I had to apply his changes to a separate branch than my
    main development branch.

    This was based on work that was already pulled into 3.16, and is a
    separate pull request to keep from having local merges in my pull
    request"

    * tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Kill "filter_string" arg of replace_preds()
    tracing: Change apply_subsystem_event_filter() paths to check file->system == dir
    tracing: Kill ftrace_event_call->files
    tracing/uprobes: Kill the dead TRACE_EVENT_FL_USE_CALL_FILTER logic
    tracing: Kill call_filter_disable()
    tracing: Kill destroy_call_preds()
    tracing: Kill destroy_preds() and destroy_file_preds()

    Linus Torvalds
     
  • Pull tracing updates from Steven Rostedt:
    "This pull request has a lot of work done. The main thing is the
    changes to the ftrace function callback infrastructure. It's
    introducing a way to allow different functions to call directly
    different trampolines instead of all calling the same "mcount" one.

    The only user of this for now is the function graph tracer, which
    always had a different trampoline, but the function tracer trampoline
    was called and did basically nothing, and then the function graph
    tracer trampoline was called. The difference now, is that the
    function graph tracer trampoline can be called directly if a function
    is only being traced by the function graph trampoline. If function
    tracing is also happening on the same function, the old way is still
    done.

    The accounting for this takes up more memory when function graph
    tracing is activated, as it needs to keep track of which functions it
    uses. I have a new way that wont take as much memory, but it's not
    ready yet for this merge window, and will have to wait for the next
    one.

    Another big change was the removal of the ftrace_start/stop() calls
    that were used by the suspend/resume code that stopped function
    tracing when entering into suspend and resume paths. The stop of
    ftrace was done because there was some function that would crash the
    system if one called smp_processor_id()! The stop/start was a big
    hammer to solve the issue at the time, which was when ftrace was first
    introduced into Linux. Now ftrace has better infrastructure to debug
    such issues, and I found the problem function and labeled it with
    "notrace" and function tracing can now safely be activated all the way
    down into the guts of suspend and resume

    Other changes include clean ups of uprobe code, clean up of the
    trace_seq() code, and other various small fixes and clean ups to
    ftrace and tracing"

    * tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits)
    ftrace: Add warning if tramp hash does not match nr_trampolines
    ftrace: Fix trampoline hash update check on rec->flags
    ring-buffer: Use rb_page_size() instead of open coded head_page size
    ftrace: Rename ftrace_ops field from trampolines to nr_trampolines
    tracing: Convert local function_graph functions to static
    ftrace: Do not copy old hash when resetting
    tracing: let user specify tracing_thresh after selecting function_graph
    ring-buffer: Always run per-cpu ring buffer resize with schedule_work_on()
    tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST
    s390/ftrace: remove check of obsolete variable function_trace_stop
    arm64, ftrace: Remove check of obsolete variable function_trace_stop
    Blackfin: ftrace: Remove check of obsolete variable function_trace_stop
    metag: ftrace: Remove check of obsolete variable function_trace_stop
    microblaze: ftrace: Remove check of obsolete variable function_trace_stop
    MIPS: ftrace: Remove check of obsolete variable function_trace_stop
    parisc: ftrace: Remove check of obsolete variable function_trace_stop
    sh: ftrace: Remove check of obsolete variable function_trace_stop
    sparc64,ftrace: Remove check of obsolete variable function_trace_stop
    tile: ftrace: Remove check of obsolete variable function_trace_stop
    ftrace: x86: Remove check of obsolete variable function_trace_stop
    ...

    Linus Torvalds
     

28 Jul, 2014

1 commit

  • There's no need to check cloned event's permission once the
    parent was already checked.

    Also the code is checking 'current' process permissions, which
    is not owner process for cloned events, thus could end up with
    wrong permission check result.

    Reported-by: Alexander Yarygin
    Tested-by: Alexander Yarygin
    Signed-off-by: Jiri Olsa
    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Corey Ashford
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Linus Torvalds
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/1405079782-8139-1-git-send-email-jolsa@kernel.org
    Signed-off-by: Ingo Molnar

    Jiri Olsa
     

24 Jul, 2014

1 commit