10 Mar, 2016

1 commit

  • Commit f37755490fe9b ("tracepoints: Do not trace when cpu is offline") added
    a check to make sure that tracepoints only get called when the cpu is
    online, as it uses rcu_read_lock_sched() for protection.

    Commit 3a630178fd5f3 ("tracing: generate RCU warnings even when tracepoints
    are disabled") added lockdep checks (including rcu checks) for events that
    are not enabled to catch possible RCU issues that would only be triggered if
    a trace event was enabled. Commit f37755490fe9b only stopped the warnings
    when the trace event was enabled but did not prevent warnings if the trace
    event was called when disabled.

    To fix this, the cpu online check is moved to where the condition is added
    to the trace event. This will place the cpu online check in all places that
    it may be used now and in the future.

    Cc: stable@vger.kernel.org # v3.18+
    Fixes: f37755490fe9b ("tracepoints: Do not trace when cpu is offline")
    Fixes: 3a630178fd5f3 ("tracing: generate RCU warnings even when tracepoints are disabled")
    Reported-by: Sudeep Holla
    Tested-by: Sudeep Holla
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

16 Feb, 2016

1 commit

  • The tracepoint infrastructure uses RCU sched protection to enable and
    disable tracepoints safely. There are some instances where tracepoints are
    used in infrastructure code (like kfree()) that get called after a CPU is
    going offline, and perhaps when it is coming back online but hasn't been
    registered yet.

    This can probuce the following warning:

    [ INFO: suspicious RCU usage. ]
    4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S
    -------------------------------
    include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage!

    other info that might help us debug this:

    RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1
    no locks held by swapper/8/0.

    stack backtrace:
    CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S 4.4.0-00006-g0fe53e8-dirty #34
    Call Trace:
    [c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable)
    [c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170
    [c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440
    [c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100
    [c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150
    [c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140
    [c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310
    [c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60
    [c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40
    [c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560
    [c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360
    [c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14

    This warning is not a false positive either. RCU is not protecting code that
    is being executed while the CPU is offline.

    Instead of playing "whack-a-mole(TM)" and adding conditional statements to
    the tracepoints we find that are used in this instance, simply add a
    cpu_online() test to the tracepoint code where the tracepoint will be
    ignored if the CPU is offline.

    Use of raw_smp_processor_id() is fine, as there should never be a case where
    the tracepoint code goes from running on a CPU that is online and suddenly
    gets migrated to a CPU that is offline.

    Link: http://lkml.kernel.org/r/1455387773-4245-1-git-send-email-kda@linux-powerpc.org

    Reported-by: Denis Kirjanov
    Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints")
    Cc: stable@vger.kernel.org # v2.6.28+
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

13 Jan, 2016

1 commit

  • Pull tracing updates from Steven Rostedt:
    "Not much new with tracing for this release. Mostly just clean ups and
    minor fixes.

    Here's what else is new:

    - A new TRACE_EVENT_FN_COND macro, combining both _FN and _COND for
    those that want both.

    - New selftest to test the instance create and delete

    - Better debug output when ftrace fails"

    * tag 'trace-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (24 commits)
    ftrace: Fix the race between ftrace and insmod
    ftrace: Add infrastructure for delayed enabling of module functions
    x86: ftrace: Fix the comments for ftrace_modify_code_direct()
    tracing: Fix comment to use tracing_on over tracing_enable
    metag: ftrace: Fix the comments for ftrace_modify_code
    sh: ftrace: Fix the comments for ftrace_modify_code()
    ia64: ftrace: Fix the comments for ftrace_modify_code()
    ftrace: Clean up ftrace_module_init() code
    ftrace: Join functions ftrace_module_init() and ftrace_init_module()
    tracing: Introduce TRACE_EVENT_FN_COND macro
    tracing: Use seq_buf_used() in seq_buf_to_user() instead of len
    bpf: Constify bpf_verifier_ops structure
    ftrace: Have ftrace_ops_get_func() handle RCU and PER_CPU flags too
    ftrace: Remove use of control list and ops
    ftrace: Fix output of enabled_functions for showing tramp
    ftrace: Fix a typo in comment
    ftrace: Show all tramps registered to a record on ftrace_bug()
    ftrace: Add variable ftrace_expected for archs to show expected code
    ftrace: Add new type to distinguish what kind of ftrace_bug()
    tracing: Update cond flag when enabling or disabling a trigger
    ...

    Linus Torvalds
     

12 Jan, 2016

1 commit

  • Pull perf updates from Ingo Molnar:
    "Kernel side changes:

    - Intel Knights Landing support. (Harish Chegondi)

    - Intel Broadwell-EP uncore PMU support. (Kan Liang)

    - Core code improvements. (Peter Zijlstra.)

    - Event filter, LBR and PEBS fixes. (Stephane Eranian)

    - Enable cycles:pp on Intel Atom. (Stephane Eranian)

    - Add cycles:ppp support for Skylake. (Andi Kleen)

    - Various x86 NMI overhead optimizations. (Andi Kleen)

    - Intel PT enhancements. (Takao Indoh)

    - AMD cache events fix. (Vince Weaver)

    Tons of tooling changes:

    - Show random perf tool tips in the 'perf report' bottom line
    (Namhyung Kim)

    - perf report now defaults to --group if the perf.data file has
    grouped events, try it with:

    # perf record -e '{cycles,instructions}' -a sleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 1.093 MB perf.data (1247 samples) ]
    # perf report
    # Samples: 1K of event 'anon group { cycles, instructions }'
    # Event count (approx.): 1955219195
    #
    # Overhead Command Shared Object Symbol

    2.86% 0.22% swapper [kernel.kallsyms] [k] intel_idle
    1.05% 0.33% firefox libxul.so [.] js::SetObjectElement
    1.05% 0.00% kworker/0:3 [kernel.kallsyms] [k] gen6_ring_get_seqno
    0.88% 0.17% chrome chrome [.] 0x0000000000ee27ab
    0.65% 0.86% firefox libxul.so [.] js::ValueToId
    0.64% 0.23% JS Helper libxul.so [.] js::SplayTree::splay
    0.62% 1.27% firefox libxul.so [.] js::GetIterator
    0.61% 1.74% firefox libxul.so [.] js::NativeSetProperty
    0.61% 0.31% firefox libxul.so [.] js::SetPropertyByDefining

    - Introduce the 'perf stat record/report' workflow:

    Generate perf.data files from 'perf stat', to tap into the
    scripting capabilities perf has instead of defining a 'perf stat'
    specific scripting support to calculate event ratios, etc.

    Simple example:

    $ perf stat record -e cycles usleep 1

    Performance counter stats for 'usleep 1':

    1,134,996 cycles

    0.000670644 seconds time elapsed

    $ perf stat report

    Performance counter stats for '/home/acme/bin/perf stat record -e cycles usleep 1':

    1,134,996 cycles

    0.000670644 seconds time elapsed

    $

    It generates PERF_RECORD_ userspace records to store the details:

    $ perf report -D | grep PERF_RECORD
    0xf0 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 27637
    0x118 [0x12]: PERF_RECORD_CPU_MAP nr: 1 cpu: 65535
    0x12a [0x40]: PERF_RECORD_STAT_CONFIG
    0x16a [0x30]: PERF_RECORD_STAT
    -1 -1 0x19a [0x40]: PERF_RECORD_MMAP -1/0: [0xffffffff81000000(0x1f000000) @ 0xffffffff81000000]: x [kernel.kallsyms]_text
    0x1da [0x18]: PERF_RECORD_STAT_ROUND
    [acme@ssdandy linux]$

    An effort was made to make perf.data files generated like this to
    not generate cryptic messages when processed by older tools.

    The 'perf script' bits need rebasing, will go up later.

    - Make command line options always available, even when they depend
    on some feature being enabled, warning the user about use of such
    options (Wang Nan)

    - Support hw breakpoint events (mem:0xAddress) in the default output
    mode in 'perf script' (Wang Nan)

    - Fixes and improvements for supporting annotating ARM binaries,
    support ARM call and jump instructions, more work needed to have
    arch specific stuff separated into tools/perf/arch/*/annotate/
    (Russell King)

    - Add initial 'perf config' command, for now just with a --list
    command to the contents of the configuration file in use and a
    basic man page describing its format, commands for doing edits and
    detailed documentation are being reviewed and proof-read. (Taeung
    Song)

    - Allows BPF scriptlets specify arguments to be fetched using DWARF
    info, using a prologue generated at compile/build time (He Kuang,
    Wang Nan)

    - Allow attaching BPF scriptlets to module symbols (Wang Nan)

    - Allow attaching BPF scriptlets to userspace code using uprobe (Wang
    Nan)

    - BPF programs now can specify 'perf probe' tunables via its section
    name, separating key=val values using semicolons (Wang Nan)

    Testing some of these new BPF features:

    Use case: get callchains when receiving SSL packets, filter then in the
    kernel, at arbitrary place.

    # cat ssl.bpf.c
    #define SEC(NAME) __attribute__((section(NAME), used))

    struct pt_regs;

    SEC("func=__inet_lookup_established hnum")
    int func(struct pt_regs *ctx, int err, unsigned short port)
    {
    return err == 0 && port == 443;
    }

    char _license[] SEC("license") = "GPL";
    int _version SEC("version") = LINUX_VERSION_CODE;
    #
    # perf record -a -g -e ssl.bpf.c
    ^C[ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.787 MB perf.data (3 samples) ]
    # perf script | head -30
    swapper 0 [000] 58783.268118: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
    8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
    896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
    8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
    855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
    8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
    8572a8 process_backlog (/lib/modules/4.3.0+/build/vmlinux)
    856b11 net_rx_action (/lib/modules/4.3.0+/build/vmlinux)
    2a284b __do_softirq (/lib/modules/4.3.0+/build/vmlinux)
    2a2ba3 irq_exit (/lib/modules/4.3.0+/build/vmlinux)
    96b7a4 do_IRQ (/lib/modules/4.3.0+/build/vmlinux)
    969807 ret_from_intr (/lib/modules/4.3.0+/build/vmlinux)
    2dede5 cpu_startup_entry (/lib/modules/4.3.0+/build/vmlinux)
    95d5bc rest_init (/lib/modules/4.3.0+/build/vmlinux)
    1163ffa start_kernel ([kernel.vmlinux].init.text)
    11634d7 x86_64_start_reservations ([kernel.vmlinux].init.text)
    1163623 x86_64_start_kernel ([kernel.vmlinux].init.text)

    qemu-system-x86 9178 [003] 58785.792417: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
    8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
    896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
    8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
    855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
    8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
    856660 netif_receive_skb_internal (/lib/modules/4.3.0+/build/vmlinux)
    8566ec netif_receive_skb_sk (/lib/modules/4.3.0+/build/vmlinux)
    430a br_handle_frame_finish ([bridge])
    48bc br_handle_frame ([bridge])
    855f44 __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
    8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
    #

    - Use 'perf probe' various options to list functions, see what
    variables can be collected at any given point, experiment first
    collecting without a filter, then filter, use it together with
    'perf trace', 'perf top', with or without callchains, if it
    explodes, please tell us!

    - Introduce a new callchain mode: "folded", that will list per line
    representations of all callchains for a give histogram entry,
    facilitating 'perf report' output processing by other tools, such
    as Brendan Gregg's flamegraph tools (Namhyung Kim)

    E.g:

    # perf report | grep -v ^# | head
    18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry
    |
    ---cpu_startup_entry
    |
    |--12.07%--start_secondary
    |
    --6.30%--rest_init
    start_kernel
    x86_64_start_reservations
    x86_64_start_kernel
    #

    Becomes, in "folded" mode:

    # perf report -g folded | grep -v ^# | head -5
    18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry
    12.07% cpu_startup_entry;start_secondary
    6.30% cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
    16.90% 0.00% swapper [kernel.kallsyms] [k] call_cpuidle
    11.23% call_cpuidle;cpu_startup_entry;start_secondary
    5.67% call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
    16.90% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter
    11.23% cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
    5.67% cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
    15.12% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter_state
    #

    The user can also select one of "count", "period" or "percent" as
    the first column.

    ... and lots of infrastructure enhancements, plus fixes and other
    changes, features I failed to list - see the shortlog and the git log
    for details"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (271 commits)
    perf evlist: Add --trace-fields option to show trace fields
    perf record: Store data mmaps for dwarf unwind
    perf libdw: Check for mmaps also in MAP__VARIABLE tree
    perf unwind: Check for mmaps also in MAP__VARIABLE tree
    perf unwind: Use find_map function in access_dso_mem
    perf evlist: Remove perf_evlist__(enable|disable)_event functions
    perf evlist: Make perf_evlist__open() open evsels with their cpus and threads (like perf record does)
    perf report: Show random usage tip on the help line
    perf hists: Export a couple of hist functions
    perf diff: Use perf_hpp__register_sort_field interface
    perf tools: Add overhead/overhead_children keys defaults via string
    perf tools: Remove list entry from struct sort_entry
    perf tools: Include all tools/lib directory for tags/cscope/TAGS targets
    perf script: Align event name properly
    perf tools: Add missing headers in perf's MANIFEST
    perf tools: Do not show trace command if it's not compiled in
    perf report: Change default to use event group view
    perf top: Decay periods in callchains
    tools lib: Move bitmap.[ch] from tools/perf/ to tools/{lib,include}/
    tools lib: Sync tools/lib/find_bit.c with the kernel
    ...

    Linus Torvalds
     

24 Dec, 2015

1 commit

  • TRACE_EVENT_FN can't be used in some circumstances
    like invoking trace functions from offlined CPU due
    to RCU usage.

    This patch adds the TRACE_EVENT_FN_COND macro
    to make such trace points conditional.

    Link: http://lkml.kernel.org/r/1450124286-4822-1-git-send-email-kda@linux-powerpc.org

    Signed-off-by: Denis Kirjanov
    Signed-off-by: Steven Rostedt

    Denis Kirjanov
     

08 Dec, 2015

1 commit

  • This commit replaces a local_irq_save()/local_irq_restore() pair with
    a lockdep assertion that interrupts are already disabled. This should
    remove the corresponding overhead from the interrupt entry/exit fastpaths.

    This change was inspired by the fact that Iftekhar Ahmed's mutation
    testing showed that removing rcu_irq_enter()'s call to local_ird_restore()
    had no effect, which might indicate that interrupts were always enabled
    anyway.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

06 Dec, 2015

1 commit

  • Steven recommended open coding access to tracepoint->key to add
    trace points to headers. Unfortunately this is difficult for some
    headers (such as x86 asm/msr.h) because including tracepoint.h
    includes so many other headers that it causes include loops.
    The main problem is the include of linux/rcupdate.h, which
    pulls in a lot of other headers. The rcu header is only needed
    when actually defining trace points.

    Move the struct tracepoint into a separate tracepoint-defs.h
    header that can be included without pulling in all of RCU.

    Signed-off-by: Andi Kleen
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Steven Rostedt
    Cc: Arnaldo Carvalho de Melo
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Stephane Eranian
    Cc: Thomas Gleixner
    Cc: Vince Weaver
    Link: http://lkml.kernel.org/r/1449018060-1742-2-git-send-email-andi@firstfloor.org
    Signed-off-by: Ingo Molnar

    Andi Kleen
     

03 Nov, 2015

1 commit

  • The documentation on top of __DECLARE_TRACE() does not match its
    implementation since the condition check has been added to the
    RCU lockdep checks. Update the documentation to match its
    implementation.

    Link: http://lkml.kernel.org/r/1446504164-21563-1-git-send-email-mathieu.desnoyers@efficios.com

    CC: Dave Hansen
    Fixes: a05d59a56733 "tracing: Add condition check to RCU lockdep checks"
    Signed-off-by: Mathieu Desnoyers
    Signed-off-by: Steven Rostedt

    Mathieu Desnoyers
     

26 Oct, 2015

1 commit

  • In order to guarantee that a probe will be called before other probes that
    are attached to a tracepoint, there needs to be a mechanism to provide
    priority of one probe over the others.

    Adding a prio field to the struct tracepoint_func, which lets the probes be
    sorted by the priority set in the structure. If no priority is specified,
    then a priority of 10 is given (this is a macro, and perhaps may be changed
    in the future).

    Now probes may be added to affect other probes that are attached to a
    tracepoint with a guaranteed order.

    One use case would be to allow tracing of tracepoints be able to filter by
    pid. A special (higher priority probe) may be added to the sched_switch
    tracepoint and set the necessary flags of the other tracepoints to notify
    them if they should be traced or not. In case a tracepoint is enabled at the
    sched_switch tracepoint too, the order of the two are not random.

    Cc: Mathieu Desnoyers
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

21 Oct, 2015

1 commit

  • Allow a trace events header file to disable compilation of its
    trace events by defining the preprocessor macro NOTRACE.

    This could be done, for example, according to a Kconfig option.

    Link: http://lkml.kernel.org/r/1438432079-11704-3-git-send-email-tal.shorer@gmail.com

    Signed-off-by: Tal Shorer
    Signed-off-by: Steven Rostedt

    Tal Shorer
     

08 Apr, 2015

1 commit

  • Several tracepoints use the helper functions __print_symbolic() or
    __print_flags() and pass in enums that do the mapping between the
    binary data stored and the value to print. This works well for reading
    the ASCII trace files, but when the data is read via userspace tools
    such as perf and trace-cmd, the conversion of the binary value to a
    human string format is lost if an enum is used, as userspace does not
    have access to what the ENUM is.

    For example, the tracepoint trace_tlb_flush() has:

    __print_symbolic(REC->reason,
    { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
    { TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
    { TLB_LOCAL_SHOOTDOWN, "local shootdown" },
    { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })

    Which maps the enum values to the strings they represent. But perf and
    trace-cmd do no know what value TLB_LOCAL_MM_SHOOTDOWN is, and would
    not be able to map it.

    With TRACE_DEFINE_ENUM(), developers can place these in the event header
    files and ftrace will convert the enums to their values:

    By adding:

    TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
    TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
    TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
    TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);

    $ cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/format
    [...]
    __print_symbolic(REC->reason,
    { 0, "flush on task switch" },
    { 1, "remote shootdown" },
    { 2, "local shootdown" },
    { 3, "local mm shootdown" })

    The above is what userspace expects to see, and tools do not need to
    be modified to parse them.

    Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org

    Cc: Guilherme Cox
    Cc: Tony Luck
    Cc: Xie XiuQi
    Acked-by: Namhyung Kim
    Reviewed-by: Masami Hiramatsu
    Tested-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

08 Feb, 2015

1 commit

  • The trace_tlb_flush() tracepoint can be called when a CPU is going offline.
    When a CPU is offline, RCU is no longer watching that CPU and since the
    tracepoint is protected by RCU, it must not be called. To prevent the
    tlb_flush tracepoint from being called when the CPU is offline, it was
    converted to a TRACE_EVENT_CONDITION where the condition checks if the
    CPU is online before calling the tracepoint.

    Unfortunately, this was not enough to stop lockdep from complaining about
    it. Even though the RCU protected code of the tracepoint will never be
    called, the condition is hidden within the tracepoint, and even though the
    condition prevents RCU code from being called, the lockdep checks are
    outside the tracepoint (this is to test tracepoints even when they are not
    enabled).

    Even though tracepoints should be checked to be RCU safe when they are not
    enabled, the condition should still be considered when checking RCU.

    Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com

    Fixes: 3a630178fd5f "tracing: generate RCU warnings even when tracepoints are disabled"
    Cc: stable@vger.kernel.org # 3.18+
    Acked-by: Dave Hansen
    Reported-by: Sedat Dilek
    Tested-by: Sedat Dilek
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

10 Sep, 2014

1 commit

  • Dave Jones reported seeing a bug from one of my TLB tracepoints:

    http://lkml.kernel.org/r/20140806181801.GA4605@redhat.com

    I've been running these patches for months and never saw this.
    But, a big chunk of my testing, especially with all the debugging
    enabled, was in a vm where intel_idle doesn't work. On the
    systems where I was using intel_idle, I never had lockdep enabled
    and this tracepoint on at the same time.

    This patch ensures that whenever we have lockdep available, we do
    _some_ RCU activity at the site of the tracepoint, despite
    whether the tracepoint's condition matches or even if the
    tracepoint itself is completely disabled. This is a bit of a
    hack, but it is pretty self-contained.

    I confirmed that with this patch plus lockdep I get the same
    splat as Dave Jones did, but without enabling the tracepoint
    explicitly.

    Link: http://lkml.kernel.org/p/20140807175204.C257CAC5@viggo.jf.intel.com

    Signed-off-by: Dave Hansen
    Cc: Dave Hansen
    Cc: Dave Jones ,
    Cc: paulmck@linux.vnet.ibm.com
    Cc: Ingo Molnar
    Signed-off-by: Steven Rostedt

    Dave Hansen
     

08 Aug, 2014

1 commit

  • When CONFIG_TRACING is not enabled, there's no reason to save the trace
    strings either by the linker or as a static variable that can be
    referenced later. Simply pass back the string that is given to
    tracepoint_string().

    Had to move the define to include/linux/tracepoint.h so that it is still
    visible when CONFIG_TRACING is not set.

    Link: http://lkml.kernel.org/p/1406318733-26754-2-git-send-email-nicolas.pitre@linaro.org

    Suggested-by: Nicolas Pitre
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

08 May, 2014

1 commit

  • There are some code paths in the kernel that need to do some preparations
    before it calls a tracepoint. As that code is worthless overhead when
    the tracepoint is not enabled, it would be prudent to have that code
    only run when the tracepoint is active. To accomplish this, all tracepoints
    now get a static inline function called "trace__enabled()"
    which returns true when the tracepoint is enabled and false otherwise.

    As an added bonus, that function uses the static_key of the tracepoint
    such that no branch is needed.

    if (trace_mytracepoint_enabled()) {
    arg = process_tp_arg();
    trace_mytracepoint(arg);
    }

    Will keep the "process_tp_arg()" (which may be expensive to run) from
    being executed when the tracepoint isn't enabled.

    It's best to encapsulate the tracepoint itself in the if statement
    just to keep races. For example, if you had:

    if (trace_mytracepoint_enabled())
    arg = process_tp_arg();
    trace_mytracepoint(arg);

    There's a chance that the tracepoint could be enabled just after the
    if statement, and arg will be undefined when calling the tracepoint.

    Link: http://lkml.kernel.org/r/20140506094407.507b6435@gandalf.local.home

    Acked-by: Mathieu Desnoyers
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

09 Apr, 2014

3 commits

  • Fix the following sparse warnings:

    CHECK kernel/tracepoint.c
    kernel/tracepoint.c:184:18: warning: incorrect type in assignment (different address spaces)
    kernel/tracepoint.c:184:18: expected struct tracepoint_func *tp_funcs
    kernel/tracepoint.c:184:18: got struct tracepoint_func [noderef] *funcs
    kernel/tracepoint.c:216:18: warning: incorrect type in assignment (different address spaces)
    kernel/tracepoint.c:216:18: expected struct tracepoint_func *tp_funcs
    kernel/tracepoint.c:216:18: got struct tracepoint_func [noderef] *funcs
    kernel/tracepoint.c:392:24: error: return expression in void function
    CC kernel/tracepoint.o
    kernel/tracepoint.c: In function tracepoint_module_going:
    kernel/tracepoint.c:491:6: warning: symbol 'syscall_regfunc' was not declared. Should it be static?
    kernel/tracepoint.c:508:6: warning: symbol 'syscall_unregfunc' was not declared. Should it be static?

    Link: http://lkml.kernel.org/r/1397049883-28692-1-git-send-email-mathieu.desnoyers@efficios.com

    Signed-off-by: Mathieu Desnoyers
    Signed-off-by: Steven Rostedt

    Mathieu Desnoyers
     
  • Instead of copying the num_tracepoints and tracepoints_ptrs from
    the module structure to the tp_mod structure, which only uses it to
    find the module associated to tracepoints of modules that are coming
    and going, simply copy the pointer to the module struct to the tracepoint
    tp_module structure.

    Also removed un-needed brackets around an if statement.

    Link: http://lkml.kernel.org/r/20140408201705.4dad2c4a@gandalf.local.home

    Acked-by: Mathieu Desnoyers
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Register/unregister tracepoint probes with struct tracepoint pointer
    rather than tracepoint name.

    This change, which vastly simplifies tracepoint.c, has been proposed by
    Steven Rostedt. It also removes 8.8kB (mostly of text) to the vmlinux
    size.

    From this point on, the tracers need to pass a struct tracepoint pointer
    to probe register/unregister. A probe can now only be connected to a
    tracepoint that exists. Moreover, tracers are responsible for
    unregistering the probe before the module containing its associated
    tracepoint is unloaded.

    text data bss dec hex filename
    10443444 4282528 10391552 25117524 17f4354 vmlinux.orig
    10434930 4282848 10391552 25109330 17f2352 vmlinux

    Link: http://lkml.kernel.org/r/1396992381-23785-2-git-send-email-mathieu.desnoyers@efficios.com

    CC: Ingo Molnar
    CC: Frederic Weisbecker
    CC: Andrew Morton
    CC: Frank Ch. Eigler
    CC: Johannes Berg
    Signed-off-by: Mathieu Desnoyers
    [ SDR - fixed return val in void func in tracepoint_module_going() ]
    Signed-off-by: Steven Rostedt

    Mathieu Desnoyers
     

04 Apr, 2014

1 commit

  • Pull tracing updates from Steven Rostedt:
    "Most of the changes were largely clean ups, and some documentation.
    But there were a few features that were added:

    Uprobes now work with event triggers and multi buffers and have
    support under ftrace and perf.

    The big feature is that the function tracer can now be used within the
    multi buffer instances. That is, you can now trace some functions in
    one buffer, others in another buffer, all functions in a third buffer
    and so on. They are basically agnostic from each other. This only
    works for the function tracer and not for the function graph trace,
    although you can have the function graph tracer running in the top
    level buffer (or any tracer for that matter) and have different
    function tracing going on in the sub buffers"

    * tag 'trace-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (45 commits)
    tracing: Add BUG_ON when stack end location is over written
    tracepoint: Remove unused API functions
    Revert "tracing: Move event storage for array from macro to standalone function"
    ftrace: Constify ftrace_text_reserved
    tracepoints: API doc update to tracepoint_probe_register() return value
    tracepoints: API doc update to data argument
    ftrace: Fix compilation warning about control_ops_free
    ftrace/x86: BUG when ftrace recovery fails
    ftrace: Warn on error when modifying ftrace function
    ftrace: Remove freelist from struct dyn_ftrace
    ftrace: Do not pass data to ftrace_dyn_arch_init
    ftrace: Pass retval through return in ftrace_dyn_arch_init()
    ftrace: Inline the code from ftrace_dyn_table_alloc()
    ftrace: Cleanup of global variables ftrace_new_pgs and ftrace_update_cnt
    tracing: Evaluate len expression only once in __dynamic_array macro
    tracing: Correctly expand len expressions from __dynamic_array macro
    tracing/module: Replace include of tracepoint.h with jump_label.h in module.h
    tracing: Fix event header migrate.h to include tracepoint.h
    tracing: Fix event header writeback.h to include tracepoint.h
    tracing: Warn if a tracepoint is not set via debugfs
    ...

    Linus Torvalds
     

22 Mar, 2014

1 commit

  • After the following commit:

    commit b75ef8b44b1cb95f5a26484b0e2fe37a63b12b44
    Author: Mathieu Desnoyers
    Date: Wed Aug 10 15:18:39 2011 -0400

    Tracepoint: Dissociate from module mutex

    The following functions became unnecessary:

    - tracepoint_probe_register_noupdate,
    - tracepoint_probe_unregister_noupdate,
    - tracepoint_probe_update_all.

    In fact, none of the in-kernel tracers, nor LTTng, nor SystemTAP use
    them. Remove those.

    Moreover, the functions:

    - tracepoint_iter_start,
    - tracepoint_iter_next,
    - tracepoint_iter_stop,
    - tracepoint_iter_reset.

    are unused by in-kernel tracers, LTTng and SystemTAP. Remove those too.

    Link: http://lkml.kernel.org/r/1395379142-2118-2-git-send-email-mathieu.desnoyers@efficios.com

    Signed-off-by: Mathieu Desnoyers
    Signed-off-by: Steven Rostedt

    Mathieu Desnoyers
     

04 Mar, 2014

1 commit

  • If a module fails to add its tracepoints due to module tainting, do not
    create the module event infrastructure in the debugfs directory. As the events
    will not work and worse yet, they will silently fail, making the user wonder
    why the events they enable do not display anything.

    Having a warning on module load and the events not visible to the users
    will make the cause of the problem much clearer.

    Link: http://lkml.kernel.org/r/20140227154923.265882695@goodmis.org

    Fixes: 6d723736e472 "tracing/events: add support for modules to TRACE_EVENT"
    Acked-by: Mathieu Desnoyers
    Cc: stable@vger.kernel.org # 2.6.31+
    Cc: Rusty Russell
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

19 Dec, 2013

1 commit


02 Dec, 2013

1 commit


19 Nov, 2013

1 commit

  • Vince's perf-trinity fuzzer found yet another 'interesting' problem.

    When we sample the irq_work_exit tracepoint with period==1 (or
    PERF_SAMPLE_PERIOD) and we add an fasync SIGNAL handler we create an
    infinite event generation loop:

    ,->
    | irq_work_exit() ->
    | trace_irq_work_exit() ->
    | ...
    | __perf_event_overflow() -> (due to fasync)
    | irq_work_queue() -> (irq_work_list must be empty)
    '--------- arch_irq_work_raise()

    Similar things can happen due to regular poll() wakeups if we exceed
    the ring-buffer wakeup watermark, or have an event_limit.

    To avoid this, dis-allow sampling this particular tracepoint.

    In order to achieve this, create a special perf_perm function pointer
    for each event and call this (when set) on trying to create a
    tracepoint perf event.

    [ roasted: use expr... to allow for ',' in your expression ]

    Reported-by: Vince Weaver
    Tested-by: Vince Weaver
    Signed-off-by: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Dave Jones
    Cc: Frederic Weisbecker
    Link: http://lkml.kernel.org/r/20131114152304.GC5364@laptop.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

21 Jun, 2013

1 commit

  • Each TRACE_EVENT() adds several helper functions. If two or more trace events
    share the same structure and print format, they can also share most of these
    helper functions and save a lot of space from duplicate code. This is why the
    DECLARE_EVENT_CLASS() and DEFINE_EVENT() were created.

    Some events require a trigger to be called at registering and unregistering of
    the event and to do so they use TRACE_EVENT_FN().

    If multiple events require a trigger, they currently have no choice but to use
    TRACE_EVENT_FN() as there's no DEFINE_EVENT_FN() available. This unfortunately
    causes a lot of wasted duplicate code created.

    By adding a DEFINE_EVENT_FN(), these events can still use a
    DECLARE_EVENT_CLASS() and then define their own triggers.

    Signed-off-by: Steven Rostedt
    Link: http://lkml.kernel.org/r/51C3236C.8030508@hds.com
    Signed-off-by: Seiji Aguchi
    Signed-off-by: H. Peter Anvin

    Steven Rostedt
     

11 Jun, 2013

1 commit

  • __DECLARE_TRACE_RCU() currently creates an _rcuidle() tracepoint which
    may safely be invoked from what RCU considers to be an idle CPU.
    However, these _rcuidle() tracepoints may -not- be invoked from the
    handler of an irq taken from idle, because rcu_idle_enter() zeroes
    RCU's nesting-level counter, so that the rcu_irq_exit() returning to
    idle will trigger a WARN_ON_ONCE().

    This commit therefore substitutes rcu_irq_enter() for rcu_idle_exit()
    and rcu_irq_exit() for rcu_idle_enter() in order to make the _rcuidle()
    tracepoints usable from irq handlers as well as from process context.

    Reported-by: Dave Jones
    Signed-off-by: Paul E. McKenney
    Cc: Steven Rostedt

    Paul E. McKenney
     

12 Sep, 2012

1 commit

  • Tracepoints declare a static inline trace_*_rcuidle variant of the trace
    function, to support safely generating trace events from the idle loop.
    Module code never actually uses that variant of trace functions, because
    modules don't run code that needs tracing with RCU idled. However, the
    declaration of those otherwise unused functions causes the module to
    reference rcu_idle_exit and rcu_idle_enter, which RCU does not export to
    modules.

    To avoid this, don't generate trace_*_rcuidle functions for tracepoints
    declared in module code.

    Link: http://lkml.kernel.org/r/20120905062306.GA14756@leaf

    Reported-by: Steven Rostedt
    Acked-by: Mathieu Desnoyers
    Acked-by: Paul E. McKenney
    Signed-off-by: Josh Triplett
    Signed-off-by: Steven Rostedt

    Josh Triplett
     

06 Jul, 2012

1 commit


24 Feb, 2012

1 commit

  • …_key_slow_[inc|dec]()

    So here's a boot tested patch on top of Jason's series that does
    all the cleanups I talked about and turns jump labels into a
    more intuitive to use facility. It should also address the
    various misconceptions and confusions that surround jump labels.

    Typical usage scenarios:

    #include <linux/static_key.h>

    struct static_key key = STATIC_KEY_INIT_TRUE;

    if (static_key_false(&key))
    do unlikely code
    else
    do likely code

    Or:

    if (static_key_true(&key))
    do likely code
    else
    do unlikely code

    The static key is modified via:

    static_key_slow_inc(&key);
    ...
    static_key_slow_dec(&key);

    The 'slow' prefix makes it abundantly clear that this is an
    expensive operation.

    I've updated all in-kernel code to use this everywhere. Note
    that I (intentionally) have not pushed through the rename
    blindly through to the lowest levels: the actual jump-label
    patching arch facility should be named like that, so we want to
    decouple jump labels from the static-key facility a bit.

    On non-jump-label enabled architectures static keys default to
    likely()/unlikely() branches.

    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Acked-by: Jason Baron <jbaron@redhat.com>
    Acked-by: Steven Rostedt <rostedt@goodmis.org>
    Cc: a.p.zijlstra@chello.nl
    Cc: mathieu.desnoyers@efficios.com
    Cc: davem@davemloft.net
    Cc: ddaney.cavm@gmail.com
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

    Ingo Molnar
     

13 Feb, 2012

1 commit

  • Added is a new static inline function that lets *any* tracepoint be used
    inside a rcu_idle_exit() section. And this also solves the problem where
    the same tracepoint may be used inside a rcu_idle_exit() section as well
    as outside of one.

    I added a new tracepoint function with a "_rcuidle" extension. All
    tracepoints can be used with either the normal "trace_foobar()"
    function, or the "trace_foobar_rcuidle()" function when inside a
    rcu_idle_exit() section.

    All tracepoints defined by TRACE_EVENT() or any of the derivatives
    will have a "_rcuidle()" function also defined. When a tracepoint is
    used within an rcu_idle_exit() section, the "_rcuidle()" version must
    be used. This denotes that the tracepoint is within rcu_idle_exit()
    and it allows the rcu read locks within the tracepoint to still
    be valid, as this version takes us out of rcu_idle_exit().

    Another nice aspect about this patch is that "static inline"s are not
    compiled into text when not used. So only the tracepoints that actually
    use the _rcuidle() version will have them defined in the actual text
    that is booted.

    Link: http://lkml.kernel.org/r/1328563113.2200.39.camel@gandalf.stny.rr.com>

    Acked-by: Paul E. McKenney
    Reviewed-by: Josh Triplett
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

11 Aug, 2011

1 commit

  • Copy the information needed from struct module into a local module list
    held within tracepoint.c from within the module coming/going notifier.

    This vastly simplifies locking of tracepoint registration /
    unregistration, because we don't have to take the module mutex to
    register and unregister tracepoints anymore. Steven Rostedt ran into
    dependency problems related to modules mutex vs kprobes mutex vs ftrace
    mutex vs tracepoint mutex that seems to be hard to fix without removing
    this dependency between tracepoint and module mutex. (note: it should be
    investigated whether kprobes could benefit of being dissociated from the
    modules mutex too.)

    This also fixes module handling of tracepoint list iterators, because it
    was expecting the list to be sorted by pointer address. Given we have
    control on our own list now, it's OK to sort this list which has
    tracepoints as its only purpose. The reason why this sorting is required
    is to handle the fact that seq files (and any read() operation from
    user-space) cannot hold the tracepoint mutex across multiple calls, so
    list entries may vanish between calls. With sorting, the tracepoint
    iterator becomes usable even if the list don't contain the exact item
    pointed to by the iterator anymore.

    Signed-off-by: Mathieu Desnoyers
    Acked-by: Jason Baron
    CC: Ingo Molnar
    CC: Lai Jiangshan
    CC: Peter Zijlstra
    CC: Thomas Gleixner
    CC: Masami Hiramatsu
    Link: http://lkml.kernel.org/r/20110810191839.GC8525@Krystal
    Signed-off-by: Steven Rostedt

    Mathieu Desnoyers
     

05 Apr, 2011

1 commit

  • Introduce:

    static __always_inline bool static_branch(struct jump_label_key *key);

    instead of the old JUMP_LABEL(key, label) macro.

    In this way, jump labels become really easy to use:

    Define:

    struct jump_label_key jump_key;

    Can be used as:

    if (static_branch(&jump_key))
    do unlikely code

    enable/disale via:

    jump_label_inc(&jump_key);
    jump_label_dec(&jump_key);

    that's it!

    For the jump labels disabled case, the static_branch() becomes an
    atomic_read(), and jump_label_inc()/dec() are simply atomic_inc(),
    atomic_dec() operations. We show testing results for this change below.

    Thanks to H. Peter Anvin for suggesting the 'static_branch()' construct.

    Since we now require a 'struct jump_label_key *key', we can store a pointer into
    the jump table addresses. In this way, we can enable/disable jump labels, in
    basically constant time. This change allows us to completely remove the previous
    hashtable scheme. Thanks to Peter Zijlstra for this re-write.

    Testing:

    I ran a series of 'tbench 20' runs 5 times (with reboots) for 3
    configurations, where tracepoints were disabled.

    jump label configured in
    avg: 815.6

    jump label *not* configured in (using atomic reads)
    avg: 800.1

    jump label *not* configured in (regular reads)
    avg: 803.4

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Jason Baron
    Suggested-by: H. Peter Anvin
    Tested-by: David Daney
    Acked-by: Ralf Baechle
    Acked-by: David S. Miller
    Acked-by: Mathieu Desnoyers
    Signed-off-by: Steven Rostedt

    Jason Baron
     

03 Feb, 2011

1 commit

  • Make the tracepoints more robust, making them solid enough to handle compiler
    changes by not relying on anything based on compiler-specific behavior with
    respect to structure alignment. Implement an approach proposed by David Miller:
    use an array of const pointers to refer to the individual structures, and export
    this pointer array through the linker script rather than the structures per se.
    It will consume 32 extra bytes per tracepoint (24 for structure padding and 8
    for the pointers), but are less likely to break due to compiler changes.

    History:

    commit 7e066fb8 tracepoints: add DECLARE_TRACE() and DEFINE_TRACE()
    added the aligned(32) type and variable attribute to the tracepoint structures
    to deal with gcc happily aligning statically defined structures on 32-byte
    multiples.

    One attempt was to use a 8-byte alignment for tracepoint structures by applying
    both the variable and type attribute to tracepoint structures definitions and
    declarations. It worked fine with gcc 4.5.1, but broke with gcc 4.4.4 and 4.4.5.

    The reason is that the "aligned" attribute only specify the _minimum_ alignment
    for a structure, leaving both the compiler and the linker free to align on
    larger multiples. Because tracepoint.c expects the structures to be placed as an
    array within each section, up-alignment cause NULL-pointer exceptions due to the
    extra unexpected padding.

    (this patch applies on top of -tip)

    Signed-off-by: Mathieu Desnoyers
    Acked-by: David S. Miller
    LKML-Reference:
    CC: Frederic Weisbecker
    CC: Ingo Molnar
    CC: Thomas Gleixner
    CC: Andrew Morton
    CC: Peter Zijlstra
    CC: Rusty Russell
    Signed-off-by: Steven Rostedt

    Mathieu Desnoyers
     

08 Jan, 2011

2 commits


03 Dec, 2010

1 commit

  • There are instances in the kernel that we only want to trace
    a tracepoint when a certain condition is set. But we do not
    want to test for that condition in the core kernel.
    If we test for that condition before calling the tracepoin, then
    we will be performing that test even when tracing is not enabled.
    This is 99.99% of the time.

    We currently can just filter out on that condition, but that happens
    after we write to the trace buffer. We just wasted time writing to
    the ring buffer for an event we never cared about.

    This patch adds:

    TRACE_EVENT_CONDITION() and DEFINE_EVENT_CONDITION()

    These have a new TP_CONDITION() argument that comes right after
    the TP_ARGS(). This condition can use the parameters of TP_ARGS()
    in the TRACE_EVENT() to determine if the tracepoint should be traced
    or not. The TP_CONDITION() will be placed in a if (cond) trace;

    For example, for the tracepoint sched_wakeup, it is useless to
    trace a wakeup event where the caller never actually wakes
    anything up (where success == 0). So adding:

    TP_CONDITION(success),

    which uses the "success" parameter of the wakeup tracepoint
    will have it only trace when we have successfully woken up a
    task.

    Acked-by: Mathieu Desnoyers
    Acked-by: Frederic Weisbecker
    Cc: Arjan van de Ven
    Cc: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

18 Nov, 2010

1 commit

  • This introduces the new TRACE_EVENT_FLAGS() macro in order
    to set up initial event flags value.

    This macro must simply follow the definition of a trace event
    and take the event name and the flag value as parameters:

    TRACE_EVENT(my_event, .....
    ....
    );

    TRACE_EVENT_FLAGS(my_event, 1)

    This will set up 1 as the initial my_event->flags value.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Thomas Gleixner
    Cc: Steven Rostedt
    Cc: Li Zefan
    Cc: Jason Baron

    Frederic Weisbecker
     

23 Sep, 2010

1 commit


22 Jun, 2010

1 commit


14 May, 2010

1 commit

  • This patch adds data to be passed to tracepoint callbacks.

    The created functions from DECLARE_TRACE() now need a mandatory data
    parameter. For example:

    DECLARE_TRACE(mytracepoint, int value, value)

    Will create the register function:

    int register_trace_mytracepoint((void(*)(void *data, int value))probe,
    void *data);

    As the first argument, all callbacks (probes) must take a (void *data)
    parameter. So a callback for the above tracepoint will look like:

    void myprobe(void *data, int value)
    {
    }

    The callback may choose to ignore the data parameter.

    This change allows callbacks to register a private data pointer along
    with the function probe.

    void mycallback(void *data, int value);

    register_trace_mytracepoint(mycallback, mydata);

    Then the mycallback() will receive the "mydata" as the first parameter
    before the args.

    A more detailed example:

    DECLARE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

    /* In the C file */

    DEFINE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

    [...]

    trace_mytracepoint(status);

    /* In a file registering this tracepoint */

    int my_callback(void *data, int status)
    {
    struct my_struct my_data = data;
    [...]
    }

    [...]
    my_data = kmalloc(sizeof(*my_data), GFP_KERNEL);
    init_my_data(my_data);
    register_trace_mytracepoint(my_callback, my_data);

    The same callback can also be registered to the same tracepoint as long
    as the data registered is different. Note, the data must also be used
    to unregister the callback:

    unregister_trace_mytracepoint(my_callback, my_data);

    Because of the data parameter, tracepoints declared this way can not have
    no args. That is:

    DECLARE_TRACE(mytracepoint, TP_PROTO(void), TP_ARGS());

    will cause an error.

    If no arguments are needed, a new macro can be used instead:

    DECLARE_TRACE_NOARGS(mytracepoint);

    Since there are no arguments, the proto and args fields are left out.

    This is part of a series to make the tracepoint footprint smaller:

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4914025 1088868 861512 6864405 68be15 vmlinux.class
    4918492 1084612 861512 6864616 68bee8 vmlinux.tracepoint

    Again, this patch also increases the size of the kernel, but
    lays the ground work for decreasing it.

    v5: Fixed net/core/drop_monitor.c to handle these updates.

    v4: Moved the DECLARE_TRACE() DECLARE_TRACE_NOARGS out of the
    #ifdef CONFIG_TRACE_POINTS, since the two are the same in both
    cases. The __DECLARE_TRACE() is what changes.
    Thanks to Frederic Weisbecker for pointing this out.

    v3: Made all register_* functions require data to be passed and
    all callbacks to take a void * parameter as its first argument.
    This makes the calling functions comply with C standards.

    Also added more comments to the modifications of DECLARE_TRACE().

    v2: Made the DECLARE_TRACE() have the ability to pass arguments
    and added a new DECLARE_TRACE_NOARGS() for tracepoints that
    do not need any arguments.

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Cc: Neil Horman
    Cc: David S. Miller
    Signed-off-by: Steven Rostedt

    Steven Rostedt