29 Jan, 2009

2 commits


27 Jan, 2009

4 commits


26 Jan, 2009

4 commits

  • Impact: New way of using the blktrace infrastructure

    This drops the requirement of userspace utilities to use the blktrace
    facility.

    Configuration is done thru sysfs, adding a "trace" directory to the
    partition directory where blktrace can be enabled for the associated
    request_queue.

    The same filters present in the IOCTL interface are present as sysfs
    device attributes.

    The /sys/block/sdX/sdXN/trace/enable file allows tracing without any
    filters.

    The other files in this directory: pid, act_mask, start_lba and end_lba
    can be used with the same meaning as with the IOCTL interface.

    Using the sysfs interface will only setup the request_queue->blk_trace
    fields, tracing will only take place when the "blk" tracer is selected
    via the ftrace interface, as in the following example:

    To see the trace, one can use the /d/tracing/trace file or the
    /d/tracign/trace_pipe file, with semantics defined in the ftrace
    documentation in Documentation/ftrace.txt.

    [root@f10-1 ~]# cat /t/trace
    kjournald-305 [000] 3046.491224: 8,1 A WBS 6367 + 8 -0 [000] 3046.511914: 8,1 C RS 6367 + 8 [6367]
    [root@f10-1 ~]#

    The default line context (prefix) format is the one described in the ftrace
    documentation, with the blktrace specific bits using its existing format,
    described in blkparse(8).

    If one wants to have the classic blktrace formatting, this is possible by
    using:

    [root@f10-1 ~]# echo blk_classic > /t/trace_options
    [root@f10-1 ~]# cat /t/trace
    8,1 0 3046.491224 305 A WBS 6367 + 8 /t/trace_options
    [root@f10-1 ~]# echo stacktrace > /t/trace_options

    [root@f10-1 ~]# cat /t/trace
    kjournald-305 [000] 3318.826779: 8,1 A WBS 6375 + 8
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Impact: new helper function

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Ingo Molnar

    Arnaldo Carvalho de Melo
     
  • Fix kmemtrace printk warnings:

    kernel/trace/kmemtrace.c:142: warning: format '%4ld' expects type 'long int', but argument 3 has type 'size_t'
    kernel/trace/kmemtrace.c:147: warning: format '%4ld' expects type 'long int', but argument 3 has type 'size_t'

    Signed-off-by: Randy Dunlap
    Acked-by: Eduard - Gabriel Munteanu
    Signed-off-by: Ingo Molnar

    Randy Dunlap
     
  • Ingo Molnar
     

23 Jan, 2009

5 commits

  • This patch brings various bugfixes:

    - Drop the first irrelevant task switch on the very beginning of a trace.

    - Drop the OVERHEAD word from the headers, the DURATION word is sufficient
    and will not overlap other columns.

    - Make the headers fit well their respective columns whatever the
    selected options.

    Ie, default options:

    # tracer: function_graph
    #
    # CPU DURATION FUNCTION CALLS
    # | | | | | | |

    1) 0.646 us | }
    1) | mem_cgroup_del_lru_list() {
    1) 0.624 us | lookup_page_cgroup();
    1) 1.970 us | }

    echo funcgraph-proc > trace_options

    # tracer: function_graph
    #
    # CPU TASK/PID DURATION FUNCTION CALLS
    # | | | | | | | | |

    0) bash-2937 | 0.895 us | }
    0) bash-2937 | 0.888 us | __rcu_read_unlock();
    0) bash-2937 | 0.864 us | conv_uni_to_pc();
    0) bash-2937 | 1.015 us | __rcu_read_lock();

    echo nofuncgraph-cpu > trace_options
    echo nofuncgraph-proc > trace_options

    # tracer: function_graph
    #
    # DURATION FUNCTION CALLS
    # | | | | | |

    3.752 us | native_pud_val();
    0.616 us | native_pud_val();
    0.624 us | native_pmd_val();

    About features, one can now disable the duration (this will hide the
    overhead too for convenient reasons and because on doesn't need
    overhead if it hasn't the duration):

    echo nofuncgraph-duration > trace_options

    # tracer: function_graph
    #
    # FUNCTION CALLS
    # | | | |

    cap_vm_enough_memory() {
    __vm_enough_memory() {
    vm_acct_memory();
    }
    }
    }

    And at last, an option to print the absolute time:

    //Restart from default options
    echo funcgraph-abstime > trace_options

    # tracer: function_graph
    #
    # TIME CPU DURATION FUNCTION CALLS
    # | | | | | | | |

    261.339774 | 1) + 42.823 us | }
    261.339775 | 1) 1.045 us | _spin_lock_irq();
    261.339777 | 1) 0.940 us | _spin_lock_irqsave();
    261.339778 | 1) 0.752 us | _spin_unlock_irqrestore();
    261.339780 | 1) 0.857 us | _spin_unlock_irq();
    261.339782 | 1) | flush_to_ldisc() {
    261.339783 | 1) | tty_ldisc_ref() {
    261.339783 | 1) | tty_ldisc_try() {
    261.339784 | 1) 1.075 us | _spin_lock_irqsave();
    261.339786 | 1) 0.842 us | _spin_unlock_irqrestore();
    261.339788 | 1) 4.211 us | }
    261.339788 | 1) 5.662 us | }

    The format is seconds.usecs.

    I guess no one needs the nanosec precision here, the main goal is to have
    an overview about the general timings of events, and to see the place when
    the trace switches from one cpu to another.

    ie:

    274.874760 | 1) 0.676 us | _spin_unlock();
    274.874762 | 1) 0.609 us | native_load_sp0();
    274.874763 | 1) 0.602 us | native_load_tls();
    274.878739 | 0) 0.722 us | }
    274.878740 | 0) 0.714 us | native_pmd_val();
    274.878741 | 0) 0.730 us | native_pmd_val();

    Here there is a 4000 usecs difference when we switch the cpu.

    Changes in V2:

    - Completely fix the first pointless task switch.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Impact: fix to preempt trace triggering lockdep check_flag failure

    In local_bh_disable, the use of add_preempt_count causes the
    preempt tracer to start recording the time preemption is off.
    But because it already modified the preempt_count to show
    softirqs disabled, and before it called the lockdep code to
    handle this, it causes a state that lockdep can not handle.

    The preempt tracer will reset the ring buffer on start of a trace,
    and the ring buffer reset code does a spin_lock_irqsave. This
    calls into lockdep and lockdep will fail when it detects the
    invalid state of having softirqs disabled but the internal
    current->softirqs_enabled is still set.

    The fix is to manually add the SOFTIRQ_OFFSET to preempt count
    and call the preempt tracer code outside the lockdep critical
    area.

    Thanks to Peter Zijlstra for suggesting this solution.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • The logic in the tracing_start/stop code prevents the WARN_ON
    from ever detecting if a start/stop pair was mismatched.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Impact: cleanup of duplicate features

    The trace output disables the ring buffer and prevents tracing to
    occur. The code in irqsoff to do the same thing is no longer needed.
    This patch removes it.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Ingo Molnar
     

22 Jan, 2009

8 commits

  • Cleanup the cpuid check for DS configuration.

    This also fixes a Corei7 CPUID enumeration bug.

    Signed-off-by: Markus Metzger
    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Markus Metzger
     
  • Impact: fix bad times of recent resets

    The ring buffer needs to reset its timestamps when reseting of the
    buffer, otherwise the timestamps are stale and might be used to
    calculate times in the buffer causing funny timestamps to appear.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Impact: fix bad times of recent resets

    The ring buffer needs to reset its timestamps when reseting of the
    buffer, otherwise the timestamps are stale and might be used to
    calculate times in the buffer causing funny timestamps to appear.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Impact: better data for wakeup tracer

    This patch adds the wakeup and schedule calls that are used by
    the scheduler tracer to make the wakeup tracer more readable.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Impact: add option to trace all tasks or just RT tasks

    The current wakeup tracer only traces RT task wakeups. This is
    fine for those interested in wake up timings of RT tasks, but
    it is useless for those that are interested in the causes
    of long wakeups for non RT tasks.

    This patch creates a "wakeup_rt" to implement the tracing of just
    RT tasks (as the current "wakeup" does). And makes "wakeup" now
    trace all tasks as an average developer would expect.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • If the ring buffer recording has been disabled. Do not let
    swapping of ring buffers occur. Simply return -EAGAIN.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Impact: fix to erased trace output

    To try not to have the outputing of a trace interfere with the wakeup
    tracer, it would disable tracing while the output was printing. But
    if a trace had started when it was disabled, it can show a partial
    trace. To try to solve this, on closing of the tracer, it would
    clear the trace buffer.

    The latency tracers (wakeup and irqsoff) have two buffers. One for
    recording and one for holding the max trace that is printed. The
    clearing of the trace above should only affect the recording buffer.
    But for some reason it would move the erased trace to the print
    buffer. Probably due to a race with the closing of the trace and
    the saving ofhe max race.

    The above is all pretty useless, and if the user does not want the
    printing of the trace to be traced itself, then the user can manual
    disable tracing. This patch removes all the code that tries to keep
    the output of the tracer from modifying the trace.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Ingo Molnar
     

21 Jan, 2009

5 commits

  • Impact: trace max latencies on start of latency tracing

    This patch sets the max latency to zero whenever one of the
    irq variant tracers or the wakeup tracer is set to current tracer.

    Most developers expect to see output when starting up a latency
    tracer. But since the max_latency is already set to max, and
    it takes a latency greater than max_latency to be recorded, there
    is no trace. This is not the expected behavior and has even confused
    myself.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Impact: limit ftrace dump output

    Currently ftrace_dump only calls ftrace_kill that is a fast way
    to prevent the function tracer functions from being called (just sets
    a flag and clears the function to call, nothing else). It is better
    to also turn off any recording to the ring buffers as well.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Impact: fix to print out ftrace_dump when expected

    I was debugging a hard race condition to only find out that
    after I hit the race, my log level was not at level to show
    KERN_INFO. The time it took to trigger the race was wasted because
    I did not capture the trace.

    Since ftrace_dump is only called from kernel oops (and only when
    it is set in the kernel command line to do so), or when a
    developer adds it to their own local tree, the log level of
    the print should be at KERN_EMERG to make sure the print appears.

    ftrace_dump is not called by a normal user setup, and will not
    add extra unwanted print out to the console. There is no reason
    it should be at KERN_INFO.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Impact: reset struct buffer_page.write when interrupt storm

    if struct buffer_page.write is not reset, any succedent committing
    will corrupted ring_buffer:

    static inline void
    rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer)
    {
    ......
    cpu_buffer->commit_page->commit =
    cpu_buffer->commit_page->write;
    ......
    }

    when "if (RB_WARN_ON(cpu_buffer, next_page == reader_page))", ring_buffer
    is disabled, but some reserved buffers may haven't been committed.
    we need reset struct buffer_page.write.

    when "if (unlikely(next_page == cpu_buffer->commit_page))", ring_buffer
    is still available, we should not corrupt it.

    Signed-off-by: Lai Jiangshan
    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Lai Jiangshan
     
  • Impact: fix a crash while kernel image restore

    When the function graph tracer is running and while suspend to disk, some racy
    and dangerous things happen against this tracer.

    The current task will save its registers including the stack pointer which
    contains the return address hooked by the tracer. But the current task will
    continue to enter other functions after that to save the memory, and then
    it will store other return addresses, and finally loose the old depth which
    matches the return address saved in the old stack (during the registers saving).

    So on image restore, the code will return to wrong addresses.
    And there are other things: on restore, the task will have it's "current"
    pointer overwritten during registers restoring....switching from one task to
    another... That would be insane to try to trace function graphs at these
    stages.

    This patch makes the function graph tracer listening on power events, making
    it's tracing disabled for the current task (the one that performs the
    hibernation work) while suspend/resume to disk, making the tracing safe
    during hibernation.

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

20 Jan, 2009

8 commits


19 Jan, 2009

2 commits

  • Impact: fix failure of dynamic function tracer selftest

    In a course of development, a developer does several makes on their
    kernel. Sometimes, the make might do something abnormal. In the
    case of running the recordmcount.pl script on an object twice,
    the script will duplicate all the calls to mcount in the __mcount_loc
    section.

    On boot up, the dynamic function tracer is careful when it modifies
    code, and performs several consistency checks. One is to not modify
    the call site if it is not what it expects it to be. If a function
    call site is listed twice, the first entry will convert the site
    to a nop, and the second will fail because it expected to see a
    call to mcount, but instead it sees a nop. Thus, the function tracer
    is disabled.

    Eric Sesterhenn reported seeing:

    [ 1.055440] ftrace: converting mcount calls to 0f 1f 44 00 00
    [ 1.055568] ftrace: allocating 29418 entries in 116 pages
    [ 1.061000] ------------[ cut here ]------------
    [ 1.061000] WARNING: at kernel/trace/ftrace.c:441

    [...]

    [ 1.060000] ---[ end trace 4eaa2a86a8e2da23 ]---
    [ 1.060000] ftrace failed to modify [] check_corruption+0x3/0x2d
    [ 1.060000] actual: 0f:1f:44:00:00

    This warning shows that check_corruption+0x3 already had a nop in
    its place (0x0f1f440000). After compiling another kernel the problem
    went away.

    Later Eric Paris notice the same type of issue. Luckily, he saved
    the vmlinux file that caused it. In the file we found a bunch of
    duplicate mcount call site records, which lead us to the script.

    Perhaps this problem only happens to people named Eric.

    This patch changes the script to test if the __mcount_loc already
    exists in the object file, and if it does, it will print out
    an error message and kill the compile.

    Reported-by: Eric Sesterhenn
    Reported-by: Eric Paris
    Signed-off-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Steven Rostedt
     
  • Ingo Molnar
     

17 Jan, 2009

2 commits

  • Linus Torvalds
     
  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (23 commits)
    ACPI PCI hotplug: harden against panic regression
    ACPI: rename main.c to sleep.c
    dell-laptop: move to drivers/platform/x86/ from drivers/misc/
    eeepc-laptop: enable Bluetooth ACPI details
    ACPI: fix ACPI_FADT_S4_RTC_WAKE comment
    kprobes: check CONFIG_FREEZER instead of CONFIG_PM
    PM: Fix freezer compilation if PM_SLEEP is unset
    thermal fixup for broken BIOS which has invalid trip points.
    ACPI: EC: Don't trust ECDT tables from ASUS
    ACPI: EC: Limit workaround for ASUS notebooks even more
    ACPI: thinkpad-acpi: bump up version to 0.22
    ACPI: thinkpad-acpi: handle HKEY event 6030
    ACPI: thinkpad-acpi: clean-up fan subdriver quirk
    ACPI: thinkpad-acpi: start the event hunt season
    ACPI: thinkpad-acpi: handle HKEY thermal and battery alarms
    ACPI: thinkpad-acpi: clean up hotkey_notify()
    ACPI: thinkpad-acpi: use killable instead of interruptible mutexes
    ACPI: thinkpad-acpi: add UWB radio support
    ACPI: thinkpad-acpi: preserve radio state across shutdown
    ACPI: thinkpad-acpi: resume with radios disabled
    ...

    Linus Torvalds