19 Nov, 2010

1 commit

  • Currently we have in something like the sched_switch event:

    field:char prev_comm[TASK_COMM_LEN]; offset:12; size:16; signed:1;

    When a userspace tool such as perf tries to parse this, the
    TASK_COMM_LEN is meaningless. This is done because the TRACE_EVENT() macro
    simply uses a #len to show the string of the length. When the length is
    an enum, we get a string that means nothing for tools.

    By adding a static buffer and a mutex to protect it, we can store the
    string into that buffer with snprintf and show the actual number.
    Now we get:

    field:char prev_comm[16]; offset:12; size:16; signed:1;

    Something much more useful.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

18 Nov, 2010

2 commits

  • As for the raw syscalls events, individual syscall events won't
    leak system wide information on task bound tracing. Allow non
    privileged users to use them in such workflow.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Thomas Gleixner
    Cc: Steven Rostedt
    Cc: Li Zefan
    Cc: Jason Baron

    Frederic Weisbecker
     
  • This adds a new trace event internal flag that allows them to be
    used in perf by non privileged users in case of task bound tracing.

    This is desired for syscalls tracepoint because they don't leak
    global system informations, like some other tracepoints.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Thomas Gleixner
    Cc: Steven Rostedt
    Cc: Li Zefan
    Cc: Jason Baron

    Frederic Weisbecker
     

10 Sep, 2010

1 commit

  • Replace pmu::{enable,disable,start,stop,unthrottle} with
    pmu::{add,del,start,stop}, all of which take a flags argument.

    The new interface extends the capability to stop a counter while
    keeping it scheduled on the PMU. We replace the throttled state with
    the generic stopped state.

    This also allows us to efficiently stop/start counters over certain
    code paths (like IRQ handlers).

    It also allows scheduling a counter without it starting, allowing for
    a generic frozen state (useful for rotating stopped counters).

    The stopped state is implemented in two different ways, depending on
    how the architecture implemented the throttled state:

    1) We disable the counter:
    a) the pmu has per-counter enable bits, we flip that
    b) we program a NOP event, preserving the counter state

    2) We store the counter state and ignore all read/overflow events

    Signed-off-by: Peter Zijlstra
    Cc: paulus
    Cc: stephane eranian
    Cc: Robert Richter
    Cc: Will Deacon
    Cc: Paul Mundt
    Cc: Frederic Weisbecker
    Cc: Cyrill Gorcunov
    Cc: Lin Ming
    Cc: Yanmin
    Cc: Deng-Cheng Zhu
    Cc: David Miller
    Cc: Michael Cree
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

19 Aug, 2010

1 commit

  • ftrace_event_call->perf_events, perf_trace_buf,
    fgraph_data->cpu_data and some local variables are percpu pointers
    missing __percpu markups. Add them.

    Signed-off-by: Namhyung Kim
    Acked-by: Tejun Heo
    Cc: Steven Rostedt
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: Stephane Eranian
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Namhyung Kim
     

21 Jul, 2010

2 commits

  • __print_flags() and __print_symbolic() use percpu trace_seq:

    1) Its memory is allocated at compile time, it wastes memory if we don't use tracing.
    2) It is percpu data and it wastes more memory for multi-cpus system.
    3) It disables preemption when it executes its core routine
    "trace_seq_printf(s, "%s: ", #call);" and introduces latency.

    So we move this trace_seq to struct trace_iterator.

    Signed-off-by: Lai Jiangshan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Lai Jiangshan
     
  • We found that even enabling a single trace event that will rarely be
    triggered can add big overhead to context switch.

    (lmbench context switch test)
    -------------------------------------------------
    2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
    ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
    ------ ------ ------ ------ ------ ------- -------
    2.19 2.3 2.21 2.56 2.13 2.54 2.07
    2.39 2.51 2.35 2.75 2.27 2.81 2.24

    The overhead is 6% ~ 11%.

    It's because when a trace event is enabled 3 tracepoints (sched_switch,
    sched_wakeup, sched_wakeup_new) will be activated to map pid to cmdname.

    We'd like to avoid this overhead, so add a trace option '(no)record-cmd'
    to allow to disable cmdline recording.

    Signed-off-by: Li Zefan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     

29 Jun, 2010

1 commit

  • Because kprobes and syscalls need special processing to register
    events, the class->reg() method was created to handle the differences.

    But instead of creating a default ->reg for perf and ftrace events,
    the code was scattered with:

    if (class->reg)
    class->reg();
    else
    default_reg();

    This is messy and can also lead to bugs.

    This patch cleans up this code and creates a default reg() entry for
    the events allowing for the code to directly call the class->reg()
    without the condition.

    Reported-by: Peter Zijlstra
    Acked-by: Peter Zijlstra
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

09 Jun, 2010

1 commit


28 May, 2010

1 commit

  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits)
    tracing: Add __used annotation to event variable
    perf, trace: Fix !x86 build bug
    perf report: Support multiple events on the TUI
    perf annotate: Fix up usage of the build id cache
    x86/mmiotrace: Remove redundant instruction prefix checks
    perf annotate: Add TUI interface
    perf tui: Remove annotate from popup menu after failure
    perf report: Don't start the TUI if -D is used
    perf: Fix getline undeclared
    perf: Optimize perf_tp_event_match()
    perf: Remove more code from the fastpath
    perf: Optimize the !vmalloc backed buffer
    perf: Optimize perf_output_copy()
    perf: Fix wakeup storm for RO mmap()s
    perf-record: Share per-cpu buffers
    perf-record: Remove -M
    perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffers
    perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events
    perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction
    perf tui: Allow disabling the TUI on a per command basis in ~/.perfconfig
    ...

    Linus Torvalds
     

21 May, 2010

4 commits

  • …nux-2.6-tip into trace/tip/tracing/core-7

    Conflicts:
    include/linux/ftrace_event.h
    include/trace/ftrace.h
    kernel/trace/trace_event_perf.c
    kernel/trace/trace_kprobe.c
    kernel/trace/trace_syscalls.c

    Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

    Steven Rostedt
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (182 commits)
    [SCSI] aacraid: add an ifdef'd device delete case instead of taking the device offline
    [SCSI] aacraid: prohibit access to array container space
    [SCSI] aacraid: add support for handling ATA pass-through commands.
    [SCSI] aacraid: expose physical devices for models with newer firmware
    [SCSI] aacraid: respond automatically to volumes added by config tool
    [SCSI] fcoe: fix fcoe module ref counting
    [SCSI] libfcoe: FIP Keep-Alive messages for VPorts are sent with incorrect port_id and wwn
    [SCSI] libfcoe: Fix incorrect MAC address clearing
    [SCSI] fcoe: fix a circular locking issue with rtnl and sysfs mutex
    [SCSI] libfc: Move the port_id into lport
    [SCSI] fcoe: move link speed checking into its own routine
    [SCSI] libfc: Remove extra pointer check
    [SCSI] libfc: Remove unused fc_get_host_port_type
    [SCSI] fcoe: fixes wrong error exit in fcoe_create
    [SCSI] libfc: set seq_id for incoming sequence
    [SCSI] qla2xxx: Updates to ISP82xx support.
    [SCSI] qla2xxx: Optionally disable target reset.
    [SCSI] qla2xxx: ensure flash operation and host reset via sg_reset are mutually exclusive
    [SCSI] qla2xxx: Silence bogus warning by gcc for wrap and did.
    [SCSI] qla2xxx: T10 DIF support added.
    ...

    Linus Torvalds
     
  • Avoid the swevent hash-table by using per-tracepoint
    hlists.

    Also, avoid conditionals on the fast path by ordering
    with probe unregister so that we should never get on
    the callback path without the data being there.

    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Improves performance.

    Acked-by: Frederic Weisbecker
    Signed-off-by: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Paul Mackerras
    Cc: Mike Galbraith
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

20 May, 2010

1 commit


19 May, 2010

1 commit


15 May, 2010

8 commits

  • The flags variable is protected by the event_mutex when modifying,
    but the event_mutex is not held when reading the variable.

    This is due to the fact that the reads occur in critical sections where
    taking a mutex (or even a spinlock) is not wanted.

    But the two flags that exist (enable and filter_active) have the code
    written as such to handle the reads to not need a lock.

    The enable flag is used just to know if the event is enabled or not
    and its use is always under the event_mutex. Whether or not the event
    is actually enabled is really determined by the tracepoint being
    registered. The flag is just a way to let the code know if the tracepoint
    is registered.

    The filter_active is different. It is read without the lock. If it
    is set, then the event probes jump to the filter code. There can be a
    slight mismatch between filters available and filter_active. If the flag is
    set but no filters are available, the code safely jumps to a filter nop.
    If the flag is not set and the filters are available, then the filters
    are skipped. This is acceptable since filters are usually set before
    tracing or they are set by humans, which would not notice the slight
    delay that this causes.

    v2: Fixed typo: "cacheing" -> "caching"

    Reported-by: Mathieu Desnoyers
    Acked-by: Mathieu Desnoyers
    Cc: Tom Zanussi
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The filter_active and enable both use an int (4 bytes each) to
    set a single flag. We can save 4 bytes per event by combining the
    two into a single integer.

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4894944 1018052 861512 6774508 675eec vmlinux.id
    4894871 1012292 861512 6768675 674823 vmlinux.flags

    This gives us another 5K in savings.

    The modification of both the enable and filter fields are done
    under the event_mutex, so it is still safe to combine the two.

    Note: Although Mathieu gave his Acked-by, he would like it documented
    that the reads of flags are not protected by the mutex. The way the
    code works, these reads will not break anything, but will have a
    residual effect. Since this behavior is the same even before this
    patch, describing this situation is left to another patch, as this
    patch does not change the behavior, but just brought it to Mathieu's
    attention.

    v2: Updated the event trace self test to for this change.

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Cc: Tom Zanussi
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Now that the trace_event structure is embedded in the ftrace_event_call
    structure, there is no need for the ftrace_event_call id field.
    The id field is the same as the trace_event type field.

    Removing the id and re-arranging the structure brings down the tracepoint
    footprint by another 5K.

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4895024 1023812 861512 6780348 6775bc vmlinux.print
    4894944 1018052 861512 6774508 675eec vmlinux.id

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Currently, every event has its own trace_event structure. This is
    fine since the structure is needed anyway. But the print function
    structure (trace_event_functions) is now separate. Since the output
    of the trace event is done by the class (with the exception of events
    defined by DEFINE_EVENT_PRINT), it makes sense to have the class
    define the print functions that all events in the class can use.

    This makes a bigger deal with the syscall events since all syscall events
    use the same class. The savings here is another 30K.

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4900382 1048964 861512 6810858 67ecea vmlinux.init
    4900446 1049028 861512 6810986 67ed6a vmlinux.preprint
    4895024 1023812 861512 6780348 6775bc vmlinux.print

    To accomplish this, and to let the class know what event is being
    printed, the event structure is embedded in the ftrace_event_call
    structure. This should not be an issues since the event structure
    was created for each event anyway.

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Multiple events may use the same method to print their data.
    Instead of having all events have a pointer to their print funtions,
    the trace_event structure now points to a trace_event_functions structure
    that will hold the way to print ouf the event.

    The event itself is now passed to the print function to let the print
    function know what kind of event it should print.

    This opens the door to consolidating the way several events print
    their output.

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4900382 1048964 861512 6810858 67ecea vmlinux.init
    4900446 1049028 861512 6810986 67ed6a vmlinux.preprint

    This change slightly increases the size but is needed for the next change.

    v3: Fix the branch tracer events to handle this change.

    v2: Fix the new function graph tracer event calls to handle this change.

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The raw_init function pointer in the event is used to initialize
    various kinds of events. The type of initialization needed is usually
    classed to the kind of event it is.

    Two events with the same class will always have the same initialization
    function, so it makes sense to move this to the class structure.

    Perhaps even making a special system structure would work since
    the initialization is the same for all events within a system.
    But since there's no system structure (yet), this will just move it
    to the class.

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4900375 1053380 861512 6815267 67fe23 vmlinux.fields
    4900382 1048964 861512 6810858 67ecea vmlinux.init

    The text grew very slightly, but this is a constant growth that happened
    with the changing of the C files that call the init code.
    The bigger savings is the data which will be saved the more events share
    a class.

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Move the defined fields from the event to the class structure.
    Since the fields of the event are defined by the class they belong
    to, it makes sense to have the class hold the information instead
    of the individual events. The events of the same class would just
    hold duplicate information.

    After this change the size of the kernel dropped another 3K:

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4900252 1057412 861512 6819176 680d68 vmlinux.regs
    4900375 1053380 861512 6815267 67fe23 vmlinux.fields

    Although the text increased, this was mainly due to the C files
    having to adapt to the change. This is a constant increase, where
    new tracepoints will not increase the Text. But the big drop is
    in the data size (as well as needed allocations to hold the fields).
    This will give even more savings as more tracepoints are created.

    Note, if just TRACE_EVENT()s are used and not DECLARE_EVENT_CLASS()
    with several DEFINE_EVENT()s, then the savings will be lost. But
    we are pushing developers to consolidate events with DEFINE_EVENT()
    so this should not be an issue.

    The kprobes define a unique class to every new event, but are dynamic
    so it should not be a issue.

    The syscalls however have a single class but the fields for the individual
    events are different. The syscalls use a metadata to define the
    fields. I moved the fields list from the event to the metadata and
    added a "get_fields()" function to the class. This function is used
    to find the fields. For normal events and kprobes, get_fields() just
    returns a pointer to the fields list_head in the class. For syscall
    events, it returns the fields list_head in the metadata for the event.

    v2: Fixed the syscall fields. The syscall metadata needs a list
    of fields for both enter and exit.

    Acked-by: Frederic Weisbecker
    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Cc: Tom Zanussi
    Cc: Peter Zijlstra
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • This patch removes the register functions of TRACE_EVENT() to enable
    and disable tracepoints. The registering of a event is now down
    directly in the trace_events.c file. The tracepoint_probe_register()
    is now called directly.

    The prototypes are no longer type checked, but this should not be
    an issue since the tracepoints are created automatically by the
    macros. If a prototype is incorrect in the TRACE_EVENT() macro, then
    other macros will catch it.

    The trace_event_class structure now holds the probes to be called
    by the callbacks. This removes needing to have each event have
    a separate pointer for the probe.

    To handle kprobes and syscalls, since they register probes in a
    different manner, a "reg" field is added to the ftrace_event_class
    structure. If the "reg" field is assigned, then it will be called for
    enabling and disabling of the probe for either ftrace or perf. To let
    the reg function know what is happening, a new enum (trace_reg) is
    created that has the type of control that is needed.

    With this new rework, the 82 kernel events and 618 syscall events
    has their footprint dramatically lowered:

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4914025 1088868 861512 6864405 68be15 vmlinux.class
    4918492 1084612 861512 6864616 68bee8 vmlinux.tracepoint
    4900252 1057412 861512 6819176 680d68 vmlinux.regs

    The size went from 6863829 to 6819176, that's a total of 44K
    in savings. With tracepoints being continuously added, this is
    critical that the footprint becomes minimal.

    v5: Added #ifdef CONFIG_PERF_EVENTS around a reference to perf
    specific structure in trace_events.c.

    v4: Fixed trace self tests to check probe because regfunc no longer
    exists.

    v3: Updated to handle void *data in beginning of probe parameters.
    Also added the tracepoint: check_trace_callback_type_##call().

    v2: Changed the callback probes to pass void * and typecast the
    value within the function.

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

14 May, 2010

1 commit

  • This patch creates a ftrace_event_class struct that event structs point to.
    This class struct will be made to hold information to modify the
    events. Currently the class struct only holds the events system name.

    This patch slightly increases the size, but this change lays the ground work
    of other changes to make the footprint of tracepoints smaller.

    With 82 standard tracepoints, and 618 system call tracepoints
    (two tracepoints per syscall: enter and exit):

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4914025 1088868 861512 6864405 68be15 vmlinux.class

    This patch also cleans up some stale comments in ftrace.h.

    v2: Fixed missing semi-colon in macro.

    Acked-by: Frederic Weisbecker
    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

01 May, 2010

1 commit

  • __print_hex() prints values in an array in hex (w/o '0x') (space separated)
    EX) 92 33 32 f3 ee 4d

    Signed-off-by: Li Zefan
    Signed-off-by: Tomohiro Kusumi
    Signed-off-by: Kei Tokunaga
    Acked-by: Steven Rostedt
    Signed-off-by: James Bottomley

    Kei Tokunaga
     

01 Apr, 2010

1 commit

  • Now that the ring buffer can keep track of where events are lost.
    Use this information to the output of trace_pipe:

    hackbench-3588 [001] 1326.701660: lock_acquire: ffffffff816591e0 read rcu_read_lock
    hackbench-3588 [001] 1326.701661: lock_acquire: ffff88003f4091f0 &(&dentry->d_lock)->rlock
    hackbench-3588 [001] 1326.701664: lock_release: ffff88003f4091f0 &(&dentry->d_lock)->rlock
    CPU:1 [LOST 673 EVENTS]
    hackbench-3588 [001] 1326.702711: kmem_cache_free: call_site=ffffffff81102b85 ptr=ffff880026d96738
    hackbench-3588 [001] 1326.702712: lock_release: ffff88003e1480a8 &mm->mmap_sem
    hackbench-3588 [001] 1326.702713: lock_acquire: ffff88003e1480a8 &mm->mmap_sem

    Even works with the function graph tracer:

    2) ! 170.098 us | }
    2) 4.036 us | rcu_irq_exit();
    2) 3.657 us | idle_cpu();
    2) ! 190.301 us | }
    CPU:2 [LOST 2196 EVENTS]
    2) 0.853 us | } /* cancel_dirty_page */
    2) | remove_from_page_cache() {
    2) 1.578 us | _raw_spin_lock_irq();
    2) | __remove_from_page_cache() {

    Note, it does not work with the iterator "trace" file, since it requires
    the use of consuming the page from the ring buffer to determine how many
    events were lost, which the iterator does not do.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

10 Mar, 2010

2 commits

  • Drop the obsolete "profile" naming used by perf for trace events.
    Perf can now do more than simple events counting, so generalize
    the API naming.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Cc: Masami Hiramatsu
    Cc: Jason Baron

    Frederic Weisbecker
     
  • We are taking a wrong regs snapshot when a trace event triggers.
    Either we use get_irq_regs(), which gives us the interrupted
    registers if we are in an interrupt, or we use task_pt_regs()
    which gives us the state before we entered the kernel, assuming
    we are lucky enough to be no kernel thread, in which case
    task_pt_regs() returns the initial set of regs when the kernel
    thread was started.

    What we want is different. We need a hot snapshot of the regs,
    so that we can get the instruction pointer to record in the
    sample, the frame pointer for the callchain, and some other
    things.

    Let's use the new perf_fetch_caller_regs() for that.

    Comparison with perf record -e lock: -R -a -f -g
    Before:

    perf [kernel] [k] __do_softirq
    |
    --- __do_softirq
    |
    |--55.16%-- __open
    |
    --44.84%-- __write_nocancel

    After:

    perf [kernel] [k] perf_tp_event
    |
    --- perf_tp_event
    |
    |--41.07%-- lock_acquire
    | |
    | |--39.36%-- _raw_spin_lock
    | | |
    | | |--7.81%-- hrtimer_interrupt
    | | | smp_apic_timer_interrupt
    | | | apic_timer_interrupt

    The old case was producing unreliable callchains. Now having
    right frame and instruction pointers, we have the trace we
    want.

    Also syscalls and kprobe events already have the right regs,
    let's use them instead of wasting a retrieval.

    v2: Follow the rename perf_save_regs() -> perf_fetch_caller_regs()

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Cc: Arnaldo Carvalho de Melo
    Cc: Masami Hiramatsu
    Cc: Jason Baron
    Cc: Archs

    Frederic Weisbecker
     

01 Mar, 2010

1 commit

  • …git/tip/linux-2.6-tip

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (172 commits)
    perf_event, amd: Fix spinlock initialization
    perf_event: Fix preempt warning in perf_clock()
    perf tools: Flush maps on COMM events
    perf_events, x86: Split PMU definitions into separate files
    perf annotate: Handle samples not at objdump output addr boundaries
    perf_events, x86: Remove superflous MSR writes
    perf_events: Simplify code by removing cpu argument to hw_perf_group_sched_in()
    perf_events, x86: AMD event scheduling
    perf_events: Add new start/stop PMU callbacks
    perf_events: Report the MMAP pgoff value in bytes
    perf annotate: Defer allocating sym_priv->hist array
    perf symbols: Improve debugging information about symtab origins
    perf top: Use a macro instead of a constant variable
    perf symbols: Check the right return variable
    perf/scripts: Tag syscall_name helper as not yet available
    perf/scripts: Add perf-trace-python Documentation
    perf/scripts: Remove unnecessary PyTuple resizes
    perf/scripts: Add syscall tracing scripts
    perf/scripts: Add Python scripting engine
    perf/scripts: Remove check-perf-trace from listed scripts
    ...

    Fix trivial conflict in tools/perf/util/probe-event.c

    Linus Torvalds
     

29 Jan, 2010

1 commit

  • Introduce ftrace_perf_buf_prepare() and ftrace_perf_buf_submit() to
    gather the common code that operates on raw events sampling buffer.
    This cleans up redundant code between regular trace events, syscall
    events and kprobe events.

    Changelog v1->v2:
    - Rename function name as per Masami and Frederic's suggestion
    - Add __kprobes for ftrace_perf_buf_prepare() and make
    ftrace_perf_buf_submit() inline as per Masami's suggestion
    - Export ftrace_perf_buf_prepare since modules will use it

    Signed-off-by: Xiao Guangrong
    Acked-by: Masami Hiramatsu
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Paul Mackerras
    Cc: Jason Baron
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Xiao Guangrong
     

07 Jan, 2010

2 commits

  • The previous patches added the use of print_fmt string and changes
    the trace_define_field() function to also create the fields and
    format output for the event format files.

    text data bss dec hex filename
    5857201 1355780 9336808 16549789 fc879d vmlinux
    5884589 1351684 9337896 16574169 fce6d9 vmlinux-orig

    The above shows the size of the vmlinux after this patch set
    compared to the vmlinux-orig which is before the patch set.

    This saves us 27k on text, 1k on bss and adds just 4k of data.

    The total savings of 24k in size.

    Signed-off-by: Lai Jiangshan
    LKML-Reference:
    Acked-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Lai Jiangshan
     
  • This is part of a patch set that removes the show_format method
    in the ftrace event macros.

    The print_fmt field is added to hold the string that shows
    the print_fmt in the event format files. This patch only adds
    the field but it is currently not used. Later patches will use
    this field to enable us to remove the show_format field
    and function.

    Signed-off-by: Lai Jiangshan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Lai Jiangshan
     

28 Dec, 2009

1 commit

  • Quoted from Ingo:

    | This reminds me - i think we should eliminate CONFIG_EVENT_PROFILE -
    | it's an unnecessary Kconfig complication. If both PERF_EVENTS and
    | EVENT_TRACING is enabled we should expose generic tracepoints.
    |
    | Nor is it limited to event 'profiling', so it has become a misnomer as
    | well.

    Signed-off-by: Li Zefan
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Paul Mackerras
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     

14 Dec, 2009

3 commits

  • Like total_profile_count, struct ftrace_event_call::profile_count
    is protected by event_mutex, so it doesn't need to be atomic_t.

    Signed-off-by: Li Zefan
    Acked-by: Steven Rostedt
    Cc: Jason Baron
    Cc: Masami Hiramatsu
    Cc: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Li Zefan
     
  • Call trace_define_common_fields() in event_create_dir() only.
    This avoids trace events to handle it from their define_fields
    callbacks and shrinks the kernel code size:

    text data bss dec hex filename
    5346802 1961864 7103260 14411926 dbe896 vmlinux.o.old
    5345151 1961864 7103260 14410275 dbe223 vmlinux.o

    Signed-off-by: Li Zefan
    Acked-by: Steven Rostedt
    Cc: Ingo Molnar
    Cc: Jason Baron
    Cc: Masami Hiramatsu
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Li Zefan
     
  • Use a generic trace_event_raw_init() function for all event's raw_init
    callbacks (but kprobes) instead of defining the same version for each
    of these.
    This shrinks the kernel code:

    text data bss dec hex filename
    5355293 1961928 7103260 14420481 dc0a01 vmlinux.o.old
    5346802 1961864 7103260 14411926 dbe896 vmlinux.o

    raw_init can't be removed, because ftrace events and kprobe events
    use different raw_init callbacks. Though it's possible to totally
    remove raw_init, I choose to leave it as it is for now.

    Signed-off-by: Li Zefan
    Acked-by: Steven Rostedt
    Cc: Jason Baron
    Cc: Ingo Molnar
    LKML-Reference:
    Signed-off-by: Frederic Weisbecker

    Li Zefan
     

10 Dec, 2009

1 commit

  • If the seq_read fills the buffer it will call s_start again on the next
    itertation with the same position. This causes a problem with the
    function_graph tracer because it consumes the iteration in order to
    determine leaf functions.

    What happens is that the iterator stores the entry, and the function
    graph plugin will look at the next entry. If that next entry is a return
    of the same function and task, then the function is a leaf and the
    function_graph plugin calls ring_buffer_read which moves the ring buffer
    iterator forward (the trace iterator still points to the function start
    entry).

    The copying of the trace_seq to the seq_file buffer will fail if the
    seq_file buffer is full. The seq_read will not show this entry.
    The next read by userspace will cause seq_read to again call s_start
    which will reuse the trace iterator entry (the function start entry).
    But the function return entry was already consumed. The function graph
    plugin will think that this entry is a nested function and not a leaf.

    To solve this, the trace code now checks the return status of the
    seq_printf (trace_print_seq). If the writing to the seq_file buffer
    fails, we set a flag in the iterator (leftover) and we do not reset
    the trace_seq buffer. On the next call to s_start, we check the leftover
    flag, and if it is set, we just reuse the trace_seq buffer and do not
    call into the plugin print functions.

    Before this patch:

    2) | fput() {
    2) | __fput() {
    2) 0.550 us | inotify_inode_queue_event();
    2) | __fsnotify_parent() {
    2) 0.540 us | inotify_dentry_parent_queue_event();

    After the patch:

    2) | fput() {
    2) | __fput() {
    2) 0.550 us | inotify_inode_queue_event();
    2) 0.548 us | __fsnotify_parent();
    2) 0.540 us | inotify_dentry_parent_queue_event();

    [
    Updated the patch to fix a missing return 0 from the trace_print_seq()
    stub when CONFIG_TRACING is disabled.

    Reported-by: Ingo Molnar
    ]

    Reported-by: Jiri Olsa
    Cc: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

22 Nov, 2009

1 commit

  • When we commit a trace to perf, we first check if we are
    recursing in the same buffer so that we don't mess-up the buffer
    with a recursing trace. But later on, we do the same check from
    perf to avoid commit recursion. The recursion check is desired
    early before we touch the buffer but we want to do this check
    only once.

    Then export the recursion protection from perf and use it from
    the trace events before submitting a trace.

    v2: Put appropriate Reported-by tag

    Reported-by: Peter Zijlstra
    Signed-off-by: Frederic Weisbecker
    Cc: Arnaldo Carvalho de Melo
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Cc: Masami Hiramatsu
    Cc: Jason Baron
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

08 Nov, 2009

1 commit

  • While tracing using events with perf, if one enables the
    lockdep:lock_acquire event, it will infect every other perf
    trace events.

    Basically, you can enable whatever set of trace events through
    perf but if this event is part of the set, the only result we
    can get is a long list of lock_acquire events of rcu read lock,
    and only that.

    This is because of a recursion inside perf.

    1) When a trace event is triggered, it will fill a per cpu
    buffer and submit it to perf.

    2) Perf will commit this event but will also protect some data
    using rcu_read_lock

    3) A recursion appears: rcu_read_lock triggers a lock_acquire
    event that will fill the per cpu event and then submit the
    buffer to perf.

    4) Perf detects a recursion and ignores it

    5) Perf continues its work on the previous event, but its buffer
    has been overwritten by the lock_acquire event, it has then
    been turned into a lock_acquire event of rcu read lock

    Such scenario also happens with lock_release with
    rcu_read_unlock().

    We could turn the rcu_read_lock() into __rcu_read_lock() to drop
    the lock debugging from perf fast path, but that would make us
    lose the rcu debugging and that doesn't prevent from other
    possible kind of recursion from perf in the future.

    This patch adds a recursion protection based on a counter on the
    perf trace per cpu buffers to solve the problem.

    -v2: Fixed lost whitespace, added reviewed-by tag

    Signed-off-by: Frederic Weisbecker
    Reviewed-by: Masami Hiramatsu
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Mike Galbraith
    Cc: Paul Mackerras
    Cc: Steven Rostedt
    Cc: Li Zefan
    Cc: Jason Baron
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker