23 Oct, 2010

1 commit

  • * 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
    vfs: make no_llseek the default
    vfs: don't use BKL in default_llseek
    llseek: automatically add .llseek fop
    libfs: use generic_file_llseek for simple_attr
    mac80211: disallow seeks in minstrel debug code
    lirc: make chardev nonseekable
    viotape: use noop_llseek
    raw: use explicit llseek file operations
    ibmasmfs: use generic_file_llseek
    spufs: use llseek in all file operations
    arm/omap: use generic_file_llseek in iommu_debug
    lkdtm: use generic_file_llseek in debugfs
    net/wireless: use generic_file_llseek in debugfs
    drm: use noop_llseek

    Linus Torvalds
     

18 Oct, 2010

1 commit


15 Oct, 2010

1 commit

  • All file_operations should get a .llseek operation so we can make
    nonseekable_open the default for future file operations without a
    .llseek pointer.

    The three cases that we can automatically detect are no_llseek, seq_lseek
    and default_llseek. For cases where we can we can automatically prove that
    the file offset is always ignored, we use noop_llseek, which maintains
    the current behavior of not returning an error from a seek.

    New drivers should normally not use noop_llseek but instead use no_llseek
    and call nonseekable_open at open time. Existing drivers can be converted
    to do the same when the maintainer knows for certain that no user code
    relies on calling seek on the device file.

    The generated code is often incorrectly indented and right now contains
    comments that clarify for each added line why a specific variant was
    chosen. In the version that gets submitted upstream, the comments will
    be gone and I will manually fix the indentation, because there does not
    seem to be a way to do that using coccinelle.

    Some amount of new code is currently sitting in linux-next that should get
    the same modifications, which I will do at the end of the merge window.

    Many thanks to Julia Lawall for helping me learn to write a semantic
    patch that does all this.

    ===== begin semantic patch =====
    // This adds an llseek= method to all file operations,
    // as a preparation for making no_llseek the default.
    //
    // The rules are
    // - use no_llseek explicitly if we do nonseekable_open
    // - use seq_lseek for sequential files
    // - use default_llseek if we know we access f_pos
    // - use noop_llseek if we know we don't access f_pos,
    // but we still want to allow users to call lseek
    //
    @ open1 exists @
    identifier nested_open;
    @@
    nested_open(...)
    {

    }

    @ open exists@
    identifier open_f;
    identifier i, f;
    identifier open1.nested_open;
    @@
    int open_f(struct inode *i, struct file *f)
    {

    }

    @ read disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {

    }

    @ read_no_fpos disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ write @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {

    }

    @ write_no_fpos @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ fops0 @
    identifier fops;
    @@
    struct file_operations fops = {
    ...
    };

    @ has_llseek depends on fops0 @
    identifier fops0.fops;
    identifier llseek_f;
    @@
    struct file_operations fops = {
    ...
    .llseek = llseek_f,
    ...
    };

    @ has_read depends on fops0 @
    identifier fops0.fops;
    identifier read_f;
    @@
    struct file_operations fops = {
    ...
    .read = read_f,
    ...
    };

    @ has_write depends on fops0 @
    identifier fops0.fops;
    identifier write_f;
    @@
    struct file_operations fops = {
    ...
    .write = write_f,
    ...
    };

    @ has_open depends on fops0 @
    identifier fops0.fops;
    identifier open_f;
    @@
    struct file_operations fops = {
    ...
    .open = open_f,
    ...
    };

    // use no_llseek if we call nonseekable_open
    ////////////////////////////////////////////
    @ nonseekable1 depends on !has_llseek && has_open @
    identifier fops0.fops;
    identifier nso ~= "nonseekable_open";
    @@
    struct file_operations fops = {
    ... .open = nso, ...
    +.llseek = no_llseek, /* nonseekable */
    };

    @ nonseekable2 depends on !has_llseek @
    identifier fops0.fops;
    identifier open.open_f;
    @@
    struct file_operations fops = {
    ... .open = open_f, ...
    +.llseek = no_llseek, /* open uses nonseekable */
    };

    // use seq_lseek for sequential files
    /////////////////////////////////////
    @ seq depends on !has_llseek @
    identifier fops0.fops;
    identifier sr ~= "seq_read";
    @@
    struct file_operations fops = {
    ... .read = sr, ...
    +.llseek = seq_lseek, /* we have seq_read */
    };

    // use default_llseek if there is a readdir
    ///////////////////////////////////////////
    @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier readdir_e;
    @@
    // any other fop is used that changes pos
    struct file_operations fops = {
    ... .readdir = readdir_e, ...
    +.llseek = default_llseek, /* readdir is present */
    };

    // use default_llseek if at least one of read/write touches f_pos
    /////////////////////////////////////////////////////////////////
    @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read.read_f;
    @@
    // read fops use offset
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = default_llseek, /* read accesses f_pos */
    };

    @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ... .write = write_f, ...
    + .llseek = default_llseek, /* write accesses f_pos */
    };

    // Use noop_llseek if neither read nor write accesses f_pos
    ///////////////////////////////////////////////////////////

    @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    identifier write_no_fpos.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ...
    .write = write_f,
    .read = read_f,
    ...
    +.llseek = noop_llseek, /* read and write both use no f_pos */
    };

    @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write_no_fpos.write_f;
    @@
    struct file_operations fops = {
    ... .write = write_f, ...
    +.llseek = noop_llseek, /* write uses no f_pos */
    };

    @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    @@
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = noop_llseek, /* read uses no f_pos */
    };

    @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    @@
    struct file_operations fops = {
    ...
    +.llseek = noop_llseek, /* no read or write fn */
    };
    ===== End semantic patch =====

    Signed-off-by: Arnd Bergmann
    Cc: Julia Lawall
    Cc: Christoph Hellwig

    Arnd Bergmann
     

15 Sep, 2010

4 commits

  • …stedt/linux-2.6-trace into perf/core

    Ingo Molnar
     
  • The enums for FTRACE_ENABLE_MCOUNT and FTRACE_DISABLE_MCOUNT were
    used as commands to ftrace_run_update_code(). But these commands
    were used by the old nasty ftrace daemon that has long been slain.

    This is a clean up patch to remove the references to these enums
    and simplify the code a little.

    Reported-by: Wu Zhangjin
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • If we do:

    # cd /sys/kernel/debug
    # echo 'do_IRQ:traceon schedule:traceon sys_write:traceon' > \
    set_ftrace_filter
    # cat set_ftrace_filter

    We get the following output:

    #### all functions enabled ####
    sys_write:traceon:unlimited
    schedule:traceon:unlimited
    do_IRQ:traceon:unlimited

    This outputs two lists. One is the fact that all functions are
    currently enabled for function tracing, the other has three probed
    functions, which happen to have 'traceon' as their commands.

    Currently, when reading the first list (functions enabled) the
    seq_file code will receive a "NULL" from the t_next() function
    causing it to exit early. This makes "read()" from userspace stop
    reading the code at this boarder. Although read is allowed to do this,
    some (broken) applications might consider this an end of file and
    stop early.

    This patch adds the start of the second list to t_next() when it
    finishes the first list. It is a simple change and gives the
    set_ftrace_filter file nicer reading ability.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • This patch keeps track of the index within the elements of
    set_ftrace_filter and if the position goes backwards, it nicely
    resets and starts from the beginning again.

    This allows for lseek and pread to work properly now.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

14 Sep, 2010

2 commits

  • The set_ftrace_filter uses seq_file and reads from two lists. The
    pointer returned by t_next() can either be of type struct dyn_ftrace
    or struct ftrace_func_probe. If there is a bug (there was one)
    the wrong pointer may be used and the reference can cause an oops.

    This patch makes t_next() and friends only return the iterator structure
    which now has a pointer of type struct dyn_ftrace and struct
    ftrace_func_probe. The t_show() can now test if the pointer is NULL or
    not and if the pointer exists, it is guaranteed to be of the correct type.

    Now if there's a bug, only wrong data will be shown but not an oops.

    Cc: Chris Wright
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • After the filtered functions are read, the probed functions are read
    from the hash in set_ftrace_filter. When the hashed probed functions
    are read, the *pos passed in is reset. Instead of modifying the pos
    given to the read function, just record the pos where the filtered
    functions ended and subtract from that.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

10 Sep, 2010

1 commit

  • Be sure to avoid entering t_show() with FTRACE_ITER_HASH set without
    having properly started the iterator to iterate the hash. This case is
    degenerate and, as discovered by Robert Swiecki, can cause t_hash_show()
    to misuse a pointer. This causes a NULL ptr deref with possible security
    implications. Tracked as CVE-2010-3079.

    Cc: Robert Swiecki
    Cc: Eugene Teo
    Cc:
    Signed-off-by: Chris Wright
    Signed-off-by: Steven Rostedt

    Chris Wright
     

09 Sep, 2010

1 commit

  • Reading the file set_ftrace_filter does three things.

    1) shows whether or not filters are set for the function tracer
    2) shows what functions are set for the function tracer
    3) shows what triggers are set on any functions

    3 is independent from 1 and 2.

    The way this file currently works is that it is a state machine,
    and as you read it, it may change state. But this assumption breaks
    when you use lseek() on the file. The state machine gets out of sync
    and the t_show() may use the wrong pointer and cause a kernel oops.

    Luckily, this will only kill the app that does the lseek, but the app
    dies while holding a mutex. This prevents anyone else from using the
    set_ftrace_filter file (or any other function tracing file for that matter).

    A real fix for this is to rewrite the code, but that is too much for
    a -rc release or stable. This patch simply disables llseek on the
    set_ftrace_filter() file for now, and we can do the proper fix for the
    next major release.

    Reported-by: Robert Swiecki
    Cc: Chris Wright
    Cc: Tavis Ormandy
    Cc: Eugene Teo
    Cc: vendor-sec@lst.de
    Cc:
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

01 Sep, 2010

1 commit

  • While we are reading trace_stat/functionX and someone just
    disabled function_profile at that time, we can trigger this:

    divide error: 0000 [#1] PREEMPT SMP
    ...
    EIP is at function_stat_show+0x90/0x230
    ...

    This fix just takes the ftrace_profile_lock and checks if
    rec->counter is 0. If it's 0, we know the profile buffer
    has been reset.

    Signed-off-by: Li Zefan
    Cc: stable@kernel.org
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     

04 Jun, 2010

1 commit

  • The ftrace_preempt_disable/enable functions were to address a
    recursive race caused by the function tracer. The function tracer
    traces all functions which makes it easily susceptible to recursion.
    One area was preempt_enable(). This would call the scheduler and
    the schedulre would call the function tracer and loop.
    (So was it thought).

    The ftrace_preempt_disable/enable was made to protect against recursion
    inside the scheduler by storing the NEED_RESCHED flag. If it was
    set before the ftrace_preempt_disable() it would not call schedule
    on ftrace_preempt_enable(), thinking that if it was set before then
    it would have already scheduled unless it was already in the scheduler.

    This worked fine except in the case of SMP, where another task would set
    the NEED_RESCHED flag for a task on another CPU, and then kick off an
    IPI to trigger it. This could cause the NEED_RESCHED to be saved at
    ftrace_preempt_disable() but the IPI to arrive in the the preempt
    disabled section. The ftrace_preempt_enable() would not call the scheduler
    because the flag was already set before entring the section.

    This bug would cause a missed preemption check and cause lower latencies.

    Investigating further, I found that the recusion caused by the function
    tracer was not due to schedule(), but due to preempt_schedule(). Now
    that preempt_schedule is completely annotated with notrace, the recusion
    no longer is an issue.

    Reported-by: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

14 May, 2010

2 commits

  • This patch adds data to be passed to tracepoint callbacks.

    The created functions from DECLARE_TRACE() now need a mandatory data
    parameter. For example:

    DECLARE_TRACE(mytracepoint, int value, value)

    Will create the register function:

    int register_trace_mytracepoint((void(*)(void *data, int value))probe,
    void *data);

    As the first argument, all callbacks (probes) must take a (void *data)
    parameter. So a callback for the above tracepoint will look like:

    void myprobe(void *data, int value)
    {
    }

    The callback may choose to ignore the data parameter.

    This change allows callbacks to register a private data pointer along
    with the function probe.

    void mycallback(void *data, int value);

    register_trace_mytracepoint(mycallback, mydata);

    Then the mycallback() will receive the "mydata" as the first parameter
    before the args.

    A more detailed example:

    DECLARE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

    /* In the C file */

    DEFINE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

    [...]

    trace_mytracepoint(status);

    /* In a file registering this tracepoint */

    int my_callback(void *data, int status)
    {
    struct my_struct my_data = data;
    [...]
    }

    [...]
    my_data = kmalloc(sizeof(*my_data), GFP_KERNEL);
    init_my_data(my_data);
    register_trace_mytracepoint(my_callback, my_data);

    The same callback can also be registered to the same tracepoint as long
    as the data registered is different. Note, the data must also be used
    to unregister the callback:

    unregister_trace_mytracepoint(my_callback, my_data);

    Because of the data parameter, tracepoints declared this way can not have
    no args. That is:

    DECLARE_TRACE(mytracepoint, TP_PROTO(void), TP_ARGS());

    will cause an error.

    If no arguments are needed, a new macro can be used instead:

    DECLARE_TRACE_NOARGS(mytracepoint);

    Since there are no arguments, the proto and args fields are left out.

    This is part of a series to make the tracepoint footprint smaller:

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4914025 1088868 861512 6864405 68be15 vmlinux.class
    4918492 1084612 861512 6864616 68bee8 vmlinux.tracepoint

    Again, this patch also increases the size of the kernel, but
    lays the ground work for decreasing it.

    v5: Fixed net/core/drop_monitor.c to handle these updates.

    v4: Moved the DECLARE_TRACE() DECLARE_TRACE_NOARGS out of the
    #ifdef CONFIG_TRACE_POINTS, since the two are the same in both
    cases. The __DECLARE_TRACE() is what changes.
    Thanks to Frederic Weisbecker for pointing this out.

    v3: Made all register_* functions require data to be passed and
    all callbacks to take a void * parameter as its first argument.
    This makes the calling functions comply with C standards.

    Also added more comments to the modifications of DECLARE_TRACE().

    v2: Made the DECLARE_TRACE() have the ability to pass arguments
    and added a new DECLARE_TRACE_NOARGS() for tracepoints that
    do not need any arguments.

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Cc: Neil Horman
    Cc: David S. Miller
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • …inux-2.6-tip into trace/tip/tracing/core-4

    Steven Rostedt
     

07 May, 2010

1 commit


28 Apr, 2010

2 commits

  • When sleep_time is off the function profiler ignores the time that a task
    is scheduled out. When the task is scheduled out a timestamp is taken.
    When the task is scheduled back in, the timestamp is compared to the
    current time and the saved calltimes are adjusted accordingly.

    But when stopping the function profiler, the sched switch hook that
    does this adjustment was stopped before shutting down the tracer.
    This allowed some tasks to not get their timestamps set when they
    scheduled out. When the function profiler started again, this would
    skew the times of the scheduler functions.

    This patch moves the stopping of the sched switch to after the function
    profiler is stopped. It also ignores zero set calltimes, which may
    happen on start up.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • When combined with function graph tracing the ftrace function profiler
    also prints the average run time of functions. While this gives us some
    good information, it doesn't tell us anything about the variance of the
    run times of the function. This change prints out the s^2 sample
    standard deviation alongside the average.

    This change adds one entry to the profile record structure. This
    increases the memory footprint of the function profiler by 1/3 on a
    32-bit system, and by 1/5 on a 64-bit system when function graphing is
    enabled, though the memory is only allocated when the profiler is turned
    on. During the profiling, one extra line of code adds the squared
    calltime to the new record entry, so this should not adversly affect
    performance.

    Note that the square of the sample standard deviation is printed because
    there is no sqrt implementation for unsigned long long in the kernel.

    Signed-off-by: Chase Douglas
    LKML-Reference:

    [ fixed comment about ns^2 -> us^2 conversion ]
    Signed-off-by: Steven Rostedt

    Chase Douglas
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

14 Mar, 2010

1 commit

  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    locking: Make sparse work with inline spinlocks and rwlocks
    x86/mce: Fix RCU lockdep splats
    rcu: Increase RCU CPU stall timeouts if PROVE_RCU
    ftrace: Replace read_barrier_depends() with rcu_dereference_raw()
    rcu: Suppress RCU lockdep warnings during early boot
    rcu, ftrace: Fix RCU lockdep splat in ftrace_perf_buf_prepare()
    rcu: Suppress __mpol_dup() false positive from RCU lockdep
    rcu: Make rcu_read_lock_sched_held() handle !PREEMPT
    rcu: Add control variables to lockdep_rcu_dereference() diagnostics
    rcu, cgroup: Relax the check in task_subsys_state() as early boot is now handled by lockdep-RCU
    rcu: Use wrapper function instead of exporting tasklist_lock
    sched, rcu: Fix rcu_dereference() for RCU-lockdep
    rcu: Make task_subsys_state() RCU-lockdep checks handle boot-time use
    rcu: Fix holdoff for accelerated GPs for last non-dynticked CPU
    x86/gart: Unexport gart_iommu_aperture

    Fix trivial conflicts in kernel/trace/ftrace.c

    Linus Torvalds
     

13 Mar, 2010

1 commit

  • If the graph tracer is active, and a task is forked but the allocating of
    the processes graph stack fails, it can cause crash later on.

    This is due to the temporary stack being NULL, but the curr_ret_stack
    variable is copied from the parent. If it is not -1, then in
    ftrace_graph_probe_sched_switch() the following:

    for (index = next->curr_ret_stack; index >= 0; index--)
    next->ret_stack[index].calltime += timestamp;

    Will cause a kernel OOPS.

    Found with Li Zefan's ftrace_stress_test.

    Cc: stable@kernel.org
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

11 Mar, 2010

2 commits

  • …/rostedt/linux-2.6-trace into tracing/urgent

    Ingo Molnar
     
  • Replace the calls to read_barrier_depends() in
    ftrace_list_func() with rcu_dereference_raw() to improve
    readability. The reason that we use rcu_dereference_raw() here
    is that removed entries are never freed, instead they are simply
    leaked. This is one of a very few cases where use of
    rcu_dereference_raw() is the long-term right answer. And I
    don't yet know of any others. ;-)

    Signed-off-by: Paul E. McKenney
    Acked-by: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

06 Mar, 2010

1 commit

  • The declaration of ftrace_set_func() is at the start of the ftrace.c file
    and wrapped with a #ifdef CONFIG_FUNCTION_GRAPH condition. If function
    graph tracing is enabled but CONFIG_DYNAMIC_FTRACE is not, a warning
    about that function being declared static and unused is given.

    This really should have been placed within the CONFIG_FUNCTION_GRAPH
    condition that uses ftrace_set_func().

    Moving the declaration down fixes the warning and makes the code cleaner.

    Reported-by: Peter Zijlstra
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

01 Mar, 2010

1 commit

  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (172 commits)
    perf_event, amd: Fix spinlock initialization
    perf_event: Fix preempt warning in perf_clock()
    perf tools: Flush maps on COMM events
    perf_events, x86: Split PMU definitions into separate files
    perf annotate: Handle samples not at objdump output addr boundaries
    perf_events, x86: Remove superflous MSR writes
    perf_events: Simplify code by removing cpu argument to hw_perf_group_sched_in()
    perf_events, x86: AMD event scheduling
    perf_events: Add new start/stop PMU callbacks
    perf_events: Report the MMAP pgoff value in bytes
    perf annotate: Defer allocating sym_priv->hist array
    perf symbols: Improve debugging information about symtab origins
    perf top: Use a macro instead of a constant variable
    perf symbols: Check the right return variable
    perf/scripts: Tag syscall_name helper as not yet available
    perf/scripts: Add perf-trace-python Documentation
    perf/scripts: Remove unnecessary PyTuple resizes
    perf/scripts: Add syscall tracing scripts
    perf/scripts: Add Python scripting engine
    perf/scripts: Remove check-perf-trace from listed scripts
    ...

    Fix trivial conflict in tools/perf/util/probe-event.c

    Linus Torvalds
     

26 Feb, 2010

1 commit


12 Feb, 2010

1 commit

  • I don't see why we can only clear all functions from the filter.

    After patching:

    # echo sys_open > set_graph_function
    # echo sys_close >> set_graph_function
    # cat set_graph_function
    sys_open
    sys_close
    # echo '!sys_close' >> set_graph_function
    # cat set_graph_function
    sys_open

    Signed-off-by: Li Zefan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     

04 Feb, 2010

2 commits

  • Remove record freezing. Because kprobes never puts probe on
    ftrace's mcount call anymore, it doesn't need ftrace to check
    whether kprobes on it.

    Signed-off-by: Masami Hiramatsu
    Cc: systemtap
    Cc: DLE
    Cc: Steven Rostedt
    Cc: przemyslaw@pawelczyk.it
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Masami Hiramatsu
     
  • Introducing *_text_reserved functions for checking the text
    address range is partially reserved or not. This patch provides
    checking routines for x86 smp alternatives and dynamic ftrace.
    Since both functions modify fixed pieces of kernel text, they
    should reserve and protect those from other dynamic text
    modifier, like kprobes.

    This will also be extended when introducing other subsystems
    which modify fixed pieces of kernel text. Dynamic text modifiers
    should avoid those.

    Signed-off-by: Masami Hiramatsu
    Cc: systemtap
    Cc: DLE
    Cc: Steven Rostedt
    Cc: przemyslaw@pawelczyk.it
    Cc: Frederic Weisbecker
    Cc: Ananth N Mavinakayanahalli
    Cc: Jim Keniston
    Cc: Mathieu Desnoyers
    Cc: Jason Baron
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Masami Hiramatsu
     

15 Jan, 2010

1 commit

  • For '*foo' pattern, we should allow any string ending with
    'foo', but ftrace filter incorrectly disallows strings
    like bar_foo_foo:

    # echo '*io' > set_ftrace_filter
    # cat set_ftrace_filter | grep 'req_bio_endio'
    # cat available_filter_functions | grep 'req_bio_endio'
    req_bio_endio

    Signed-off-by: Li Zefan
    LKML-Reference:
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Li Zefan
     

14 Dec, 2009

3 commits

  • # echo 'do_open' > set_graph_function
    # echo 'do_open' >> set_graph_function
    bash: echo: write error: Invalid argument

    Make it valid to write the same value to set_graph_function,
    which is consistent with set_ftrace_filter interface.

    Signed-off-by: Li Zefan
    Acked-by: Steven Rostedt
    LKML-reference:
    Signed-off-by: Frederic Weisbecker

    Li Zefan
     
  • I found a weird behavior:

    # echo 'fuse:*' > set_ftrace_filter
    bash: echo: write error: Invalid argument
    # cat set_ftrace_filter
    fuse_dev_fasync
    fuse_dev_poll
    fuse_copy_do

    We should call trace_parser_clear() no matter ftrace_process_regex()
    returns 0 or -errno, otherwise we will actually take the unaccepted
    records from ftrace_regex_release().

    Signed-off-by: Li Zefan
    Acked-by: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Li Zefan
     
  • Currently it doesn't warn user on invald value:

    # echo nonexist_symbol > set_ftrace_filter
    or:
    # echo 'nonexist_symbol:mod:fuse' > set_ftrace_filter

    Better make it return failure.

    Signed-off-by: Li Zefan
    Acked-by: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Li Zefan
     

06 Dec, 2009

1 commit

  • …el/git/tip/linux-2.6-tip

    * 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (40 commits)
    tracing: Separate raw syscall from syscall tracer
    ring-buffer-benchmark: Add parameters to set produce/consumer priorities
    tracing, function tracer: Clean up strstrip() usage
    ring-buffer benchmark: Run producer/consumer threads at nice +19
    tracing: Remove the stale include/trace/power.h
    tracing: Only print objcopy version warning once from recordmcount
    tracing: Prevent build warning: 'ftrace_graph_buf' defined but not used
    ring-buffer: Move access to commit_page up into function used
    tracing: do not disable interrupts for trace_clock_local
    ring-buffer: Add multiple iterations between benchmark timestamps
    kprobes: Sanitize struct kretprobe_instance allocations
    tracing: Fix to use __always_unused attribute
    compiler: Introduce __always_unused
    tracing: Exit with error if a weak function is used in recordmcount.pl
    tracing: Move conditional into update_funcs() in recordmcount.pl
    tracing: Add regex for weak functions in recordmcount.pl
    tracing: Move mcount section search to front of loop in recordmcount.pl
    tracing: Fix objcopy revision check in recordmcount.pl
    tracing: Check absolute path of input file in recordmcount.pl
    tracing: Correct the check for number of arguments in recordmcount.pl
    ...

    Linus Torvalds
     

23 Nov, 2009

1 commit

  • Clean up strstrip() usage - which also addresses this build warning:

    kernel/trace/ftrace.c: In function 'ftrace_pid_write':
    kernel/trace/ftrace.c:3004: warning: ignoring return value of 'strstrip', declared with attribute warn_unused_result

    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

18 Nov, 2009

1 commit


04 Nov, 2009

1 commit

  • When a command is passed to the set_ftrace_filter, then
    the ftrace_regex_lock is still held going back to user space.

    # echo 'do_open : foo' > set_ftrace_filter
    (still holding ftrace_regex_lock when returning to user space!)

    Signed-off-by: Li Zefan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     

29 Oct, 2009

1 commit


24 Oct, 2009

1 commit

  • Instead of directly updating filp->f_pos we should update the *ppos
    argument. The filp->f_pos gets updated within the file_pos_write()
    function called from sys_write().

    Signed-off-by: Jiri Olsa
    Signed-off-by: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jiri Olsa
     

15 Oct, 2009

1 commit

  • We are using strncpy in the wrong way to copy the ftrace_graph_filter
    boot param because we pass the buffer size instead of the max string
    size it can contain (buffer size - 1). The end result might not be
    NULL terminated as we are abusing the max string size.

    Lets use strlcpy() instead.

    Reported-by: Li Zefan
    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt

    Frederic Weisbecker