06 Dec, 2011

1 commit

  • Though not all events have field 'prev_pid', it was allowed to do this:

    # echo 'prev_pid == 100' > events/sched/filter

    but commit 75b8e98263fdb0bfbdeba60d4db463259f1fe8a2 (tracing/filter: Swap
    entire filter of events) broke it without any reason.

    Link: http://lkml.kernel.org/r/4EAF46CF.8040408@cn.fujitsu.com

    Signed-off-by: Li Zefan
    Signed-off-by: Steven Rostedt

    Li Zefan
     

15 Jul, 2011

1 commit

  • Currently the stack trace per event in ftace is only 8 frames.
    This can be quite limiting and sometimes useless. Especially when
    the "ignore frames" is wrong and we also use up stack frames for
    the event processing itself.

    Change this to be dynamic by adding a percpu buffer that we can
    write a large stack frame into and then copy into the ring buffer.

    For interrupts and NMIs that come in while another event is being
    process, will only get to use the 8 frame stack. That should be enough
    as the task that it interrupted will have the full stack frame anyway.

    Requested-by: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

15 Jun, 2011

1 commit

  • Fix to support kernel stack trace correctly on kprobe-tracer.
    Since the execution path of kprobe-based dynamic events is different
    from other tracepoint-based events, normal ftrace_trace_stack() doesn't
    work correctly. To fix that, this introduces ftrace_trace_stack_regs()
    which traces stack via pt_regs instead of current stack register.

    e.g.

    # echo p schedule+4 > /sys/kernel/debug/tracing/kprobe_events
    # echo 1 > /sys/kernel/debug/tracing/options/stacktrace
    # echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
    # head -n 20 /sys/kernel/debug/tracing/trace
    bash-2968 [000] 10297.050245: p_schedule_4: (schedule+0x4/0x4ca)
    bash-2968 [000] 10297.050247:
    => schedule_timeout
    => n_tty_read
    => tty_read
    => vfs_read
    => sys_read
    => system_call_fastpath
    kworker/0:1-2940 [000] 10297.050265: p_schedule_4: (schedule+0x4/0x4ca)
    kworker/0:1-2940 [000] 10297.050266:
    => worker_thread
    => kthread
    => kernel_thread_helper
    sshd-1132 [000] 10297.050365: p_schedule_4: (schedule+0x4/0x4ca)
    sshd-1132 [000] 10297.050365:
    => sysret_careful

    Note: Even with this fix, the first entry will be skipped
    if the probe is put on the function entry area before
    the frame pointer is set up (usually, that is 4 bytes
    (push %bp; mov %sp %bp) on x86), because stack unwinder
    depends on the frame pointer.

    Signed-off-by: Masami Hiramatsu
    Cc: Frederic Weisbecker
    Cc: yrl.pp-manager.tt@hitachi.com
    Cc: Peter Zijlstra
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/20110608070934.17777.17116.stgit@fedora15
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     

26 May, 2011

1 commit


07 May, 2011

1 commit

  • This partially reverts commit e6e1e2593592a8f6f6380496655d8c6f67431266.

    That commit changed the structure layout of the trace structure, which
    in turn broke PowerTOP (1.9x generation) quite badly.

    I appreciate not wanting to expose the variable in question, and
    PowerTOP was not using it, so I've replaced the variable with just a
    padding field - that way if in the future a new field is needed it can
    just use this padding field.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

10 Mar, 2011

1 commit

  • The lock_depth field in the event headers was added as a temporary
    data point for help in removing the BKL. Now that the BKL is pretty
    much been removed, we can remove this field.

    This in turn changes the header from 12 bytes to 8 bytes,
    removing the 4 byte buffer that gcc would insert if the first field
    in the data load was 8 bytes in size.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

08 Feb, 2011

1 commit


19 Nov, 2010

1 commit

  • Currently we have in something like the sched_switch event:

    field:char prev_comm[TASK_COMM_LEN]; offset:12; size:16; signed:1;

    When a userspace tool such as perf tries to parse this, the
    TASK_COMM_LEN is meaningless. This is done because the TRACE_EVENT() macro
    simply uses a #len to show the string of the length. When the length is
    an enum, we get a string that means nothing for tools.

    By adding a static buffer and a mutex to protect it, we can store the
    string into that buffer with snprintf and show the actual number.
    Now we get:

    field:char prev_comm[16]; offset:12; size:16; signed:1;

    Something much more useful.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

18 Nov, 2010

2 commits

  • As for the raw syscalls events, individual syscall events won't
    leak system wide information on task bound tracing. Allow non
    privileged users to use them in such workflow.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Thomas Gleixner
    Cc: Steven Rostedt
    Cc: Li Zefan
    Cc: Jason Baron

    Frederic Weisbecker
     
  • This adds a new trace event internal flag that allows them to be
    used in perf by non privileged users in case of task bound tracing.

    This is desired for syscalls tracepoint because they don't leak
    global system informations, like some other tracepoints.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Thomas Gleixner
    Cc: Steven Rostedt
    Cc: Li Zefan
    Cc: Jason Baron

    Frederic Weisbecker
     

10 Sep, 2010

1 commit

  • Replace pmu::{enable,disable,start,stop,unthrottle} with
    pmu::{add,del,start,stop}, all of which take a flags argument.

    The new interface extends the capability to stop a counter while
    keeping it scheduled on the PMU. We replace the throttled state with
    the generic stopped state.

    This also allows us to efficiently stop/start counters over certain
    code paths (like IRQ handlers).

    It also allows scheduling a counter without it starting, allowing for
    a generic frozen state (useful for rotating stopped counters).

    The stopped state is implemented in two different ways, depending on
    how the architecture implemented the throttled state:

    1) We disable the counter:
    a) the pmu has per-counter enable bits, we flip that
    b) we program a NOP event, preserving the counter state

    2) We store the counter state and ignore all read/overflow events

    Signed-off-by: Peter Zijlstra
    Cc: paulus
    Cc: stephane eranian
    Cc: Robert Richter
    Cc: Will Deacon
    Cc: Paul Mundt
    Cc: Frederic Weisbecker
    Cc: Cyrill Gorcunov
    Cc: Lin Ming
    Cc: Yanmin
    Cc: Deng-Cheng Zhu
    Cc: David Miller
    Cc: Michael Cree
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

19 Aug, 2010

1 commit

  • ftrace_event_call->perf_events, perf_trace_buf,
    fgraph_data->cpu_data and some local variables are percpu pointers
    missing __percpu markups. Add them.

    Signed-off-by: Namhyung Kim
    Acked-by: Tejun Heo
    Cc: Steven Rostedt
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: Stephane Eranian
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Namhyung Kim
     

21 Jul, 2010

2 commits

  • __print_flags() and __print_symbolic() use percpu trace_seq:

    1) Its memory is allocated at compile time, it wastes memory if we don't use tracing.
    2) It is percpu data and it wastes more memory for multi-cpus system.
    3) It disables preemption when it executes its core routine
    "trace_seq_printf(s, "%s: ", #call);" and introduces latency.

    So we move this trace_seq to struct trace_iterator.

    Signed-off-by: Lai Jiangshan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Lai Jiangshan
     
  • We found that even enabling a single trace event that will rarely be
    triggered can add big overhead to context switch.

    (lmbench context switch test)
    -------------------------------------------------
    2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
    ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
    ------ ------ ------ ------ ------ ------- -------
    2.19 2.3 2.21 2.56 2.13 2.54 2.07
    2.39 2.51 2.35 2.75 2.27 2.81 2.24

    The overhead is 6% ~ 11%.

    It's because when a trace event is enabled 3 tracepoints (sched_switch,
    sched_wakeup, sched_wakeup_new) will be activated to map pid to cmdname.

    We'd like to avoid this overhead, so add a trace option '(no)record-cmd'
    to allow to disable cmdline recording.

    Signed-off-by: Li Zefan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     

29 Jun, 2010

1 commit

  • Because kprobes and syscalls need special processing to register
    events, the class->reg() method was created to handle the differences.

    But instead of creating a default ->reg for perf and ftrace events,
    the code was scattered with:

    if (class->reg)
    class->reg();
    else
    default_reg();

    This is messy and can also lead to bugs.

    This patch cleans up this code and creates a default reg() entry for
    the events allowing for the code to directly call the class->reg()
    without the condition.

    Reported-by: Peter Zijlstra
    Acked-by: Peter Zijlstra
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

09 Jun, 2010

1 commit


28 May, 2010

1 commit

  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits)
    tracing: Add __used annotation to event variable
    perf, trace: Fix !x86 build bug
    perf report: Support multiple events on the TUI
    perf annotate: Fix up usage of the build id cache
    x86/mmiotrace: Remove redundant instruction prefix checks
    perf annotate: Add TUI interface
    perf tui: Remove annotate from popup menu after failure
    perf report: Don't start the TUI if -D is used
    perf: Fix getline undeclared
    perf: Optimize perf_tp_event_match()
    perf: Remove more code from the fastpath
    perf: Optimize the !vmalloc backed buffer
    perf: Optimize perf_output_copy()
    perf: Fix wakeup storm for RO mmap()s
    perf-record: Share per-cpu buffers
    perf-record: Remove -M
    perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffers
    perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events
    perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction
    perf tui: Allow disabling the TUI on a per command basis in ~/.perfconfig
    ...

    Linus Torvalds
     

21 May, 2010

4 commits

  • …nux-2.6-tip into trace/tip/tracing/core-7

    Conflicts:
    include/linux/ftrace_event.h
    include/trace/ftrace.h
    kernel/trace/trace_event_perf.c
    kernel/trace/trace_kprobe.c
    kernel/trace/trace_syscalls.c

    Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

    Steven Rostedt
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (182 commits)
    [SCSI] aacraid: add an ifdef'd device delete case instead of taking the device offline
    [SCSI] aacraid: prohibit access to array container space
    [SCSI] aacraid: add support for handling ATA pass-through commands.
    [SCSI] aacraid: expose physical devices for models with newer firmware
    [SCSI] aacraid: respond automatically to volumes added by config tool
    [SCSI] fcoe: fix fcoe module ref counting
    [SCSI] libfcoe: FIP Keep-Alive messages for VPorts are sent with incorrect port_id and wwn
    [SCSI] libfcoe: Fix incorrect MAC address clearing
    [SCSI] fcoe: fix a circular locking issue with rtnl and sysfs mutex
    [SCSI] libfc: Move the port_id into lport
    [SCSI] fcoe: move link speed checking into its own routine
    [SCSI] libfc: Remove extra pointer check
    [SCSI] libfc: Remove unused fc_get_host_port_type
    [SCSI] fcoe: fixes wrong error exit in fcoe_create
    [SCSI] libfc: set seq_id for incoming sequence
    [SCSI] qla2xxx: Updates to ISP82xx support.
    [SCSI] qla2xxx: Optionally disable target reset.
    [SCSI] qla2xxx: ensure flash operation and host reset via sg_reset are mutually exclusive
    [SCSI] qla2xxx: Silence bogus warning by gcc for wrap and did.
    [SCSI] qla2xxx: T10 DIF support added.
    ...

    Linus Torvalds
     
  • Avoid the swevent hash-table by using per-tracepoint
    hlists.

    Also, avoid conditionals on the fast path by ordering
    with probe unregister so that we should never get on
    the callback path without the data being there.

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Improves performance.

    Acked-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

20 May, 2010

1 commit


19 May, 2010

1 commit


15 May, 2010

8 commits

  • The flags variable is protected by the event_mutex when modifying,
    but the event_mutex is not held when reading the variable.

    This is due to the fact that the reads occur in critical sections where
    taking a mutex (or even a spinlock) is not wanted.

    But the two flags that exist (enable and filter_active) have the code
    written as such to handle the reads to not need a lock.

    The enable flag is used just to know if the event is enabled or not
    and its use is always under the event_mutex. Whether or not the event
    is actually enabled is really determined by the tracepoint being
    registered. The flag is just a way to let the code know if the tracepoint
    is registered.

    The filter_active is different. It is read without the lock. If it
    is set, then the event probes jump to the filter code. There can be a
    slight mismatch between filters available and filter_active. If the flag is
    set but no filters are available, the code safely jumps to a filter nop.
    If the flag is not set and the filters are available, then the filters
    are skipped. This is acceptable since filters are usually set before
    tracing or they are set by humans, which would not notice the slight
    delay that this causes.

    v2: Fixed typo: "cacheing" -> "caching"

    Reported-by: Mathieu Desnoyers
    Acked-by: Mathieu Desnoyers
    Cc: Tom Zanussi
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The filter_active and enable both use an int (4 bytes each) to
    set a single flag. We can save 4 bytes per event by combining the
    two into a single integer.

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4894944 1018052 861512 6774508 675eec vmlinux.id
    4894871 1012292 861512 6768675 674823 vmlinux.flags

    This gives us another 5K in savings.

    The modification of both the enable and filter fields are done
    under the event_mutex, so it is still safe to combine the two.

    Note: Although Mathieu gave his Acked-by, he would like it documented
    that the reads of flags are not protected by the mutex. The way the
    code works, these reads will not break anything, but will have a
    residual effect. Since this behavior is the same even before this
    patch, describing this situation is left to another patch, as this
    patch does not change the behavior, but just brought it to Mathieu's
    attention.

    v2: Updated the event trace self test to for this change.

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Cc: Tom Zanussi
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Now that the trace_event structure is embedded in the ftrace_event_call
    structure, there is no need for the ftrace_event_call id field.
    The id field is the same as the trace_event type field.

    Removing the id and re-arranging the structure brings down the tracepoint
    footprint by another 5K.

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4895024 1023812 861512 6780348 6775bc vmlinux.print
    4894944 1018052 861512 6774508 675eec vmlinux.id

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Currently, every event has its own trace_event structure. This is
    fine since the structure is needed anyway. But the print function
    structure (trace_event_functions) is now separate. Since the output
    of the trace event is done by the class (with the exception of events
    defined by DEFINE_EVENT_PRINT), it makes sense to have the class
    define the print functions that all events in the class can use.

    This makes a bigger deal with the syscall events since all syscall events
    use the same class. The savings here is another 30K.

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4900382 1048964 861512 6810858 67ecea vmlinux.init
    4900446 1049028 861512 6810986 67ed6a vmlinux.preprint
    4895024 1023812 861512 6780348 6775bc vmlinux.print

    To accomplish this, and to let the class know what event is being
    printed, the event structure is embedded in the ftrace_event_call
    structure. This should not be an issues since the event structure
    was created for each event anyway.

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Multiple events may use the same method to print their data.
    Instead of having all events have a pointer to their print funtions,
    the trace_event structure now points to a trace_event_functions structure
    that will hold the way to print ouf the event.

    The event itself is now passed to the print function to let the print
    function know what kind of event it should print.

    This opens the door to consolidating the way several events print
    their output.

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4900382 1048964 861512 6810858 67ecea vmlinux.init
    4900446 1049028 861512 6810986 67ed6a vmlinux.preprint

    This change slightly increases the size but is needed for the next change.

    v3: Fix the branch tracer events to handle this change.

    v2: Fix the new function graph tracer event calls to handle this change.

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The raw_init function pointer in the event is used to initialize
    various kinds of events. The type of initialization needed is usually
    classed to the kind of event it is.

    Two events with the same class will always have the same initialization
    function, so it makes sense to move this to the class structure.

    Perhaps even making a special system structure would work since
    the initialization is the same for all events within a system.
    But since there's no system structure (yet), this will just move it
    to the class.

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4900375 1053380 861512 6815267 67fe23 vmlinux.fields
    4900382 1048964 861512 6810858 67ecea vmlinux.init

    The text grew very slightly, but this is a constant growth that happened
    with the changing of the C files that call the init code.
    The bigger savings is the data which will be saved the more events share
    a class.

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Move the defined fields from the event to the class structure.
    Since the fields of the event are defined by the class they belong
    to, it makes sense to have the class hold the information instead
    of the individual events. The events of the same class would just
    hold duplicate information.

    After this change the size of the kernel dropped another 3K:

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4900252 1057412 861512 6819176 680d68 vmlinux.regs
    4900375 1053380 861512 6815267 67fe23 vmlinux.fields

    Although the text increased, this was mainly due to the C files
    having to adapt to the change. This is a constant increase, where
    new tracepoints will not increase the Text. But the big drop is
    in the data size (as well as needed allocations to hold the fields).
    This will give even more savings as more tracepoints are created.

    Note, if just TRACE_EVENT()s are used and not DECLARE_EVENT_CLASS()
    with several DEFINE_EVENT()s, then the savings will be lost. But
    we are pushing developers to consolidate events with DEFINE_EVENT()
    so this should not be an issue.

    The kprobes define a unique class to every new event, but are dynamic
    so it should not be a issue.

    The syscalls however have a single class but the fields for the individual
    events are different. The syscalls use a metadata to define the
    fields. I moved the fields list from the event to the metadata and
    added a "get_fields()" function to the class. This function is used
    to find the fields. For normal events and kprobes, get_fields() just
    returns a pointer to the fields list_head in the class. For syscall
    events, it returns the fields list_head in the metadata for the event.

    v2: Fixed the syscall fields. The syscall metadata needs a list
    of fields for both enter and exit.

    Acked-by: Frederic Weisbecker
    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Cc: Tom Zanussi
    Cc: Peter Zijlstra
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • This patch removes the register functions of TRACE_EVENT() to enable
    and disable tracepoints. The registering of a event is now down
    directly in the trace_events.c file. The tracepoint_probe_register()
    is now called directly.

    The prototypes are no longer type checked, but this should not be
    an issue since the tracepoints are created automatically by the
    macros. If a prototype is incorrect in the TRACE_EVENT() macro, then
    other macros will catch it.

    The trace_event_class structure now holds the probes to be called
    by the callbacks. This removes needing to have each event have
    a separate pointer for the probe.

    To handle kprobes and syscalls, since they register probes in a
    different manner, a "reg" field is added to the ftrace_event_class
    structure. If the "reg" field is assigned, then it will be called for
    enabling and disabling of the probe for either ftrace or perf. To let
    the reg function know what is happening, a new enum (trace_reg) is
    created that has the type of control that is needed.

    With this new rework, the 82 kernel events and 618 syscall events
    has their footprint dramatically lowered:

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4914025 1088868 861512 6864405 68be15 vmlinux.class
    4918492 1084612 861512 6864616 68bee8 vmlinux.tracepoint
    4900252 1057412 861512 6819176 680d68 vmlinux.regs

    The size went from 6863829 to 6819176, that's a total of 44K
    in savings. With tracepoints being continuously added, this is
    critical that the footprint becomes minimal.

    v5: Added #ifdef CONFIG_PERF_EVENTS around a reference to perf
    specific structure in trace_events.c.

    v4: Fixed trace self tests to check probe because regfunc no longer
    exists.

    v3: Updated to handle void *data in beginning of probe parameters.
    Also added the tracepoint: check_trace_callback_type_##call().

    v2: Changed the callback probes to pass void * and typecast the
    value within the function.

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

14 May, 2010

1 commit

  • This patch creates a ftrace_event_class struct that event structs point to.
    This class struct will be made to hold information to modify the
    events. Currently the class struct only holds the events system name.

    This patch slightly increases the size, but this change lays the ground work
    of other changes to make the footprint of tracepoints smaller.

    With 82 standard tracepoints, and 618 system call tracepoints
    (two tracepoints per syscall: enter and exit):

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4914025 1088868 861512 6864405 68be15 vmlinux.class

    This patch also cleans up some stale comments in ftrace.h.

    v2: Fixed missing semi-colon in macro.

    Acked-by: Frederic Weisbecker
    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

01 May, 2010

1 commit

  • __print_hex() prints values in an array in hex (w/o '0x') (space separated)
    EX) 92 33 32 f3 ee 4d

    Signed-off-by: Li Zefan
    Signed-off-by: Tomohiro Kusumi
    Signed-off-by: Kei Tokunaga
    Acked-by: Steven Rostedt
    Signed-off-by: James Bottomley

    Kei Tokunaga
     

01 Apr, 2010

1 commit

  • Now that the ring buffer can keep track of where events are lost.
    Use this information to the output of trace_pipe:

    hackbench-3588 [001] 1326.701660: lock_acquire: ffffffff816591e0 read rcu_read_lock
    hackbench-3588 [001] 1326.701661: lock_acquire: ffff88003f4091f0 &(&dentry->d_lock)->rlock
    hackbench-3588 [001] 1326.701664: lock_release: ffff88003f4091f0 &(&dentry->d_lock)->rlock
    CPU:1 [LOST 673 EVENTS]
    hackbench-3588 [001] 1326.702711: kmem_cache_free: call_site=ffffffff81102b85 ptr=ffff880026d96738
    hackbench-3588 [001] 1326.702712: lock_release: ffff88003e1480a8 &mm->mmap_sem
    hackbench-3588 [001] 1326.702713: lock_acquire: ffff88003e1480a8 &mm->mmap_sem

    Even works with the function graph tracer:

    2) ! 170.098 us | }
    2) 4.036 us | rcu_irq_exit();
    2) 3.657 us | idle_cpu();
    2) ! 190.301 us | }
    CPU:2 [LOST 2196 EVENTS]
    2) 0.853 us | } /* cancel_dirty_page */
    2) | remove_from_page_cache() {
    2) 1.578 us | _raw_spin_lock_irq();
    2) | __remove_from_page_cache() {

    Note, it does not work with the iterator "trace" file, since it requires
    the use of consuming the page from the ring buffer to determine how many
    events were lost, which the iterator does not do.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

10 Mar, 2010

2 commits

  • Drop the obsolete "profile" naming used by perf for trace events.
    Perf can now do more than simple events counting, so generalize
    the API naming.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Cc: Masami Hiramatsu
    Cc: Jason Baron

    Frederic Weisbecker
     
  • We are taking a wrong regs snapshot when a trace event triggers.
    Either we use get_irq_regs(), which gives us the interrupted
    registers if we are in an interrupt, or we use task_pt_regs()
    which gives us the state before we entered the kernel, assuming
    we are lucky enough to be no kernel thread, in which case
    task_pt_regs() returns the initial set of regs when the kernel
    thread was started.

    What we want is different. We need a hot snapshot of the regs,
    so that we can get the instruction pointer to record in the
    sample, the frame pointer for the callchain, and some other
    things.

    Let's use the new perf_fetch_caller_regs() for that.

    Comparison with perf record -e lock: -R -a -f -g
    Before:

    perf [kernel] [k] __do_softirq
    |
    --- __do_softirq
    |
    |--55.16%-- __open
    |
    --44.84%-- __write_nocancel

    After:

    perf [kernel] [k] perf_tp_event
    |
    --- perf_tp_event
    |
    |--41.07%-- lock_acquire
    | |
    | |--39.36%-- _raw_spin_lock
    | | |
    | | |--7.81%-- hrtimer_interrupt
    | | | smp_apic_timer_interrupt
    | | | apic_timer_interrupt

    The old case was producing unreliable callchains. Now having
    right frame and instruction pointers, we have the trace we
    want.

    Also syscalls and kprobe events already have the right regs,
    let's use them instead of wasting a retrieval.

    v2: Follow the rename perf_save_regs() -> perf_fetch_caller_regs()

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Cc: Arnaldo Carvalho de Melo
    Cc: Masami Hiramatsu
    Cc: Jason Baron
    Cc: Archs

    Frederic Weisbecker
     

01 Mar, 2010

1 commit

  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (172 commits)
    perf_event, amd: Fix spinlock initialization
    perf_event: Fix preempt warning in perf_clock()
    perf tools: Flush maps on COMM events
    perf_events, x86: Split PMU definitions into separate files
    perf annotate: Handle samples not at objdump output addr boundaries
    perf_events, x86: Remove superflous MSR writes
    perf_events: Simplify code by removing cpu argument to hw_perf_group_sched_in()
    perf_events, x86: AMD event scheduling
    perf_events: Add new start/stop PMU callbacks
    perf_events: Report the MMAP pgoff value in bytes
    perf annotate: Defer allocating sym_priv->hist array
    perf symbols: Improve debugging information about symtab origins
    perf top: Use a macro instead of a constant variable
    perf symbols: Check the right return variable
    perf/scripts: Tag syscall_name helper as not yet available
    perf/scripts: Add perf-trace-python Documentation
    perf/scripts: Remove unnecessary PyTuple resizes
    perf/scripts: Add syscall tracing scripts
    perf/scripts: Add Python scripting engine
    perf/scripts: Remove check-perf-trace from listed scripts
    ...

    Fix trivial conflict in tools/perf/util/probe-event.c

    Linus Torvalds
     

29 Jan, 2010

1 commit

  • Introduce ftrace_perf_buf_prepare() and ftrace_perf_buf_submit() to
    gather the common code that operates on raw events sampling buffer.
    This cleans up redundant code between regular trace events, syscall
    events and kprobe events.

    Changelog v1->v2:
    - Rename function name as per Masami and Frederic's suggestion
    - Add __kprobes for ftrace_perf_buf_prepare() and make
    ftrace_perf_buf_submit() inline as per Masami's suggestion
    - Export ftrace_perf_buf_prepare since modules will use it

    Signed-off-by: Xiao Guangrong
    Acked-by: Masami Hiramatsu
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Paul Mackerras
    Cc: Jason Baron
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Xiao Guangrong
     

07 Jan, 2010

2 commits

  • The previous patches added the use of print_fmt string and changes
    the trace_define_field() function to also create the fields and
    format output for the event format files.

    text data bss dec hex filename
    5857201 1355780 9336808 16549789 fc879d vmlinux
    5884589 1351684 9337896 16574169 fce6d9 vmlinux-orig

    The above shows the size of the vmlinux after this patch set
    compared to the vmlinux-orig which is before the patch set.

    This saves us 27k on text, 1k on bss and adds just 4k of data.

    The total savings of 24k in size.

    Signed-off-by: Lai Jiangshan
    LKML-Reference:
    Acked-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Lai Jiangshan
     
  • This is part of a patch set that removes the show_format method
    in the ftrace event macros.

    The print_fmt field is added to hold the string that shows
    the print_fmt in the event format files. This patch only adds
    the field but it is currently not used. Later patches will use
    this field to enable us to remove the show_format field
    and function.

    Signed-off-by: Lai Jiangshan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Lai Jiangshan