16 Dec, 2015

1 commit


20 Nov, 2015

4 commits


11 Aug, 2015

1 commit

  • commit e3eea1404f5ff7a2ceb7b5e7ba412a6fd94f2935 upstream.

    Commit 4104d326b670 ("ftrace: Remove global function list and call function
    directly") simplified the ftrace code by removing the global_ops list with a
    new design. But this cleanup also broke the filtering of PIDs that are added
    to the set_ftrace_pid file.

    Add back the proper hooks to have pid filtering working once again.

    Reported-by: Matt Fleming
    Reported-by: Richard Weinberger
    Tested-by: Matt Fleming
    Signed-off-by: Steven Rostedt
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (Red Hat)
     

04 Aug, 2015

4 commits

  • commit 6224beb12e190ff11f3c7d4bf50cb2922878f600 upstream.

    Fengguang Wu's tests triggered a bug in the branch tracer's start up
    test when CONFIG_DEBUG_PREEMPT set. This was because that config
    adds some debug logic in the per cpu field, which calls back into
    the branch tracer.

    The branch tracer has its own recursive checks, but uses a per cpu
    variable to implement it. If retrieving the per cpu variable calls
    back into the branch tracer, you can see how things will break.

    Instead of using a per cpu variable, use the trace_recursion field
    of the current task struct. Simply set a bit when entering the
    branch tracing and clear it when leaving. If the bit is set on
    entry, just don't do the tracing.

    There's also the case with lockdep, as the local_irq_save() called
    before the recursion can also trigger code that can call back into
    the function. Changing that to a raw_local_irq_save() will protect
    that as well.

    This prevents the recursion and the inevitable crash that follows.

    Link: http://lkml.kernel.org/r/20150630141803.GA28071@wfg-t540p.sh.intel.com

    Reported-by: Fengguang Wu
    Tested-by: Fengguang Wu
    Signed-off-by: Steven Rostedt
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (Red Hat)
     
  • commit cc9e4bde03f2b4cfba52406c021364cbd2a4a0f3 upstream.

    The trace.h header when called without CONFIG_EVENT_TRACING enabled
    (seldom done), will not compile because of a typo in the protocol
    of trace_event_enum_update().

    Signed-off-by: Steven Rostedt
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (Red Hat)
     
  • commit 6b88f44e161b9ee2a803e5b2b1fbcf4e20e8b980 upstream.

    While debugging a WARN_ON() for filtering, I found that it is possible
    for the filter string to be referenced after its end. With the filter:

    # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter

    The filter_parse() function can call infix_get_op() which calls
    infix_advance() that updates the infix filter pointers for the cnt
    and tail without checking if the filter is already at the end, which
    will put the cnt to zero and the tail beyond the end. The loop then calls
    infix_next() that has

    ps->infix.cnt--;
    return ps->infix.string[ps->infix.tail++];

    The cnt will now be below zero, and the tail that is returned is
    already passed the end of the filter string. So far the allocation
    of the filter string usually has some buffer that is zeroed out, but
    if the filter string is of the exact size of the allocated buffer
    there's no guarantee that the charater after the nul terminating
    character will be zero.

    Luckily, only root can write to the filter.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (Red Hat)
     
  • commit b4875bbe7e68f139bd3383828ae8e994a0df6d28 upstream.

    When testing the fix for the trace filter, I could not come up with
    a scenario where the operand count goes below zero, so I added a
    WARN_ON_ONCE(cnt < 0) to the logic. But there is legitimate case
    that it can happen (although the filter would be wrong).

    # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter

    That is, a single operation without any operands will hit the path
    where the WARN_ON_ONCE() can trigger. Although this is harmless,
    and the filter is reported as a error. But instead of spitting out
    a warning to the kernel dmesg, just fail nicely and report it via
    the proper channels.

    Link: http://lkml.kernel.org/r/558C6082.90608@oracle.com

    Reported-by: Vince Weaver
    Reported-by: Sasha Levin
    Signed-off-by: Steven Rostedt
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (Red Hat)
     

17 Jun, 2015

1 commit

  • When the following filter is used it causes a warning to trigger:

    # cd /sys/kernel/debug/tracing
    # echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
    -bash: echo: write error: Invalid argument
    # cat events/ext4/ext4_truncate_exit/filter
    ((dev==1)blocks==2)
    ^
    parse_error: No error

    ------------[ cut here ]------------
    WARNING: CPU: 2 PID: 1223 at kernel/trace/trace_events_filter.c:1640 replace_preds+0x3c5/0x990()
    Modules linked in: bnep lockd grace bluetooth ...
    CPU: 3 PID: 1223 Comm: bash Tainted: G W 4.1.0-rc3-test+ #450
    Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
    0000000000000668 ffff8800c106bc98 ffffffff816ed4f9 ffff88011ead0cf0
    0000000000000000 ffff8800c106bcd8 ffffffff8107fb07 ffffffff8136b46c
    ffff8800c7d81d48 ffff8800d4c2bc00 ffff8800d4d4f920 00000000ffffffea
    Call Trace:
    [] dump_stack+0x4c/0x6e
    [] warn_slowpath_common+0x97/0xe0
    [] ? _kstrtoull+0x2c/0x80
    [] warn_slowpath_null+0x1a/0x20
    [] replace_preds+0x3c5/0x990
    [] create_filter+0x82/0xb0
    [] apply_event_filter+0xd4/0x180
    [] event_filter_write+0x8f/0x120
    [] __vfs_write+0x28/0xe0
    [] ? __sb_start_write+0x53/0xf0
    [] ? security_file_permission+0x30/0xc0
    [] vfs_write+0xb8/0x1b0
    [] SyS_write+0x4f/0xb0
    [] system_call_fastpath+0x12/0x6a
    ---[ end trace e11028bd95818dcd ]---

    Worse yet, reading the error message (the filter again) it says that
    there was no error, when there clearly was. The issue is that the
    code that checks the input does not check for balanced ops. That is,
    having an op between a closed parenthesis and the next token.

    This would only cause a warning, and fail out before doing any real
    harm, but it should still not caues a warning, and the error reported
    should work:

    # cd /sys/kernel/debug/tracing
    # echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
    -bash: echo: write error: Invalid argument
    # cat events/ext4/ext4_truncate_exit/filter
    ((dev==1)blocks==2)
    ^
    parse_error: Meaningless filter expression

    And give no kernel warning.

    Link: http://lkml.kernel.org/r/20150615175025.7e809215@gandalf.local.home

    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo
    Cc: stable@vger.kernel.org # 2.6.31+
    Reported-by: Vince Weaver
    Tested-by: Vince Weaver
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

11 Jun, 2015

1 commit


07 May, 2015

1 commit

  • The only caller to this function (__print_array) was getting it wrong by
    passing the array length instead of buffer length. As the element size
    was already being passed for other reasons it seems reasonable to push
    the calculation of buffer length into the function.

    Link: http://lkml.kernel.org/r/1430320727-14582-1-git-send-email-alex.bennee@linaro.org

    Signed-off-by: Alex Bennée
    Signed-off-by: Steven Rostedt

    Alex Bennée
     

27 Apr, 2015

1 commit

  • Pull fourth vfs update from Al Viro:
    "d_inode() annotations from David Howells (sat in for-next since before
    the beginning of merge window) + four assorted fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    RCU pathwalk breakage when running into a symlink overmounting something
    fix I_DIO_WAKEUP definition
    direct-io: only inc/dec inode->i_dio_count for file systems
    fs/9p: fix readdir()
    VFS: assorted d_backing_inode() annotations
    VFS: fs/inode.c helpers: d_inode() annotations
    VFS: fs/cachefiles: d_backing_inode() annotations
    VFS: fs library helpers: d_inode() annotations
    VFS: assorted weird filesystems: d_inode() annotations
    VFS: normal filesystems (and lustre): d_inode() annotations
    VFS: security/: d_inode() annotations
    VFS: security/: d_backing_inode() annotations
    VFS: net/: d_inode() annotations
    VFS: net/unix: d_backing_inode() annotations
    VFS: kernel/: d_inode() annotations
    VFS: audit: d_backing_inode() annotations
    VFS: Fix up some ->d_inode accesses in the chelsio driver
    VFS: Cachefiles should perform fs modifications on the top layer only
    VFS: AF_UNIX sockets should call mknod on the top layer only

    Linus Torvalds
     

23 Apr, 2015

1 commit

  • Pull tracing fixes from Steven Rostedt:
    "This adds three fixes for the tracing code.

    The first is a bug when ftrace_dump_on_oops is triggered in atomic
    context and function graph tracer is the tracer that is being
    reported.

    The second fix is bad parsing of the trace_events from the kernel
    command line, where it would ignore specific events if the system name
    is used with defining the event(it enables all events within the
    system).

    The last one is a fix to the TRACE_DEFINE_ENUM(), where a check was
    missing to see if the ptr was incremented to the end of the string,
    but the loop increments it again and can miss the nul delimiter to
    stop processing"

    * tag 'trace-v4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Fix possible out of bounds memory access when parsing enums
    tracing: Fix incorrect enabling of trace events by boot cmdline
    tracing: Handle ftrace_dump() atomic context in graph_trace_open()

    Linus Torvalds
     

17 Apr, 2015

1 commit

  • The code that replaces the enum names with the enum values in the
    tracepoints' format files could possible miss the end of string nul
    character. This was caused by processing things like backslashes, quotes
    and other tokens. After processing the tokens, a check for the nul
    character needed to be done before continuing the loop, because the loop
    incremented the pointer before doing the check, which could bypass the nul
    character.

    Link: http://lkml.kernel.org/r/552E661D.5060502@oracle.com

    Reported-by: Sasha Levin # via KASan
    Tested-by: Andrey Ryabinin
    Fixes: 0c564a538aa9 "tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values"
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

16 Apr, 2015

4 commits

  • There is a problem that trace events are not properly enabled with
    boot cmdline. The problem is that if we pass "trace_event=kmem:mm_page_alloc"
    to the boot cmdline, it enables all kmem trace events, and not just
    the page_alloc event.

    This is caused by the parsing mechanism. When we parse the cmdline, the buffer
    contents is modified due to tokenization. And, if we use this buffer
    again, we will get the wrong result.

    Unfortunately, this buffer is be accessed three times to set trace events
    properly at boot time. So, we need to handle this situation.

    There is already code handling ",", but we need another for ":".
    This patch adds it.

    Link: http://lkml.kernel.org/r/1429159484-22977-1-git-send-email-iamjoonsoo.kim@lge.com

    Cc: stable@vger.kernel.org # 3.19+
    Signed-off-by: Joonsoo Kim
    [ added missing return ret; ]
    Signed-off-by: Steven Rostedt

    Joonsoo Kim
     
  • graph_trace_open() can be called in atomic context from ftrace_dump().
    Use GFP_ATOMIC for the memory allocations when that's the case, in order
    to avoid the following splat.

    BUG: sleeping function called from invalid context at mm/slab.c:2849
    in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/0
    Backtrace:
    ..
    [] (__might_sleep) from [] (kmem_cache_alloc_trace+0x160/0x238)
    r7:87800040 r6:000080d0 r5:810d16e8 r4:000080d0
    [] (kmem_cache_alloc_trace) from [] (graph_trace_open+0x30/0xd0)
    r10:00000100 r9:809171a8 r8:00008e28 r7:810d16f0 r6:00000001 r5:810d16e8
    r4:810d16f0
    [] (graph_trace_open) from [] (trace_init_global_iter+0x50/0x9c)
    r8:00008e28 r7:808c853c r6:00000001 r5:810d16e8 r4:810d16f0 r3:800cbd30
    [] (trace_init_global_iter) from [] (ftrace_dump+0x90/0x2ec)
    r4:810d2580 r3:00000000
    [] (ftrace_dump) from [] (sysrq_ftrace_dump+0x1c/0x20)
    r10:00000100 r9:809171a8 r8:808f6e7c r7:00000001 r6:00000007 r5:0000007a
    r4:808d5394
    [] (sysrq_ftrace_dump) from [] (return_to_handler+0x0/0x18)
    [] (__handle_sysrq) from [] (return_to_handler+0x0/0x18)
    r8:808c8100 r7:808c8444 r6:00000101 r5:00000010 r4:84eb3210
    [] (handle_sysrq) from [] (return_to_handler+0x0/0x18)
    [] (pl011_int) from [] (return_to_handler+0x0/0x18)
    r10:809171bc r9:809171a8 r8:00000001 r7:00000026 r6:808c6000 r5:84f01e60
    r4:8454fe00
    [] (handle_irq_event_percpu) from [] (handle_irq_event+0x4c/0x6c)
    r10:808c7ef0 r9:87283e00 r8:00000001 r7:00000000 r6:8454fe00 r5:84f01e60
    r4:84f01e00
    [] (handle_irq_event) from [] (handle_fasteoi_irq+0xf0/0x1ac)
    r6:808f52a4 r5:84f01e60 r4:84f01e00 r3:00000000
    [] (handle_fasteoi_irq) from [] (generic_handle_irq+0x3c/0x4c)
    r6:00000026 r5:00000000 r4:00000026 r3:8007a938
    [] (generic_handle_irq) from [] (__handle_domain_irq+0x8c/0xfc)
    r4:808c1e38 r3:0000002e
    [] (__handle_domain_irq) from [] (gic_handle_irq+0x34/0x6c)
    r10:80917748 r9:00000001 r8:88802100 r7:808c7ef0 r6:808c8fb0 r5:00000015
    r4:8880210c r3:808c7ef0
    [] (gic_handle_irq) from [] (__irq_svc+0x44/0x7c)

    Link: http://lkml.kernel.org/r/1428953721-31349-1-git-send-email-rabin@rab.in
    Link: http://lkml.kernel.org/r/1428957012-2319-1-git-send-email-rabin@rab.in

    Cc: stable@vger.kernel.org # 3.13+
    Signed-off-by: Rabin Vincent
    Signed-off-by: Steven Rostedt

    Rabin Vincent
     
  • The seq_printf return value, because it's frequently misused,
    will eventually be converted to void.

    See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to
    seq_has_overflowed() and make public")

    Miscellanea:

    o Remove unused return value from trace_lookup_stack

    Signed-off-by: Joe Perches
    Acked-by: Steven Rostedt
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • relayfs and tracefs are dealing with inodes of their own;
    those two act as filesystem drivers

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

15 Apr, 2015

3 commits

  • Pull perf changes from Ingo Molnar:
    "Core kernel changes:

    - One of the more interesting features in this cycle is the ability
    to attach eBPF programs (user-defined, sandboxed bytecode executed
    by the kernel) to kprobes.

    This allows user-defined instrumentation on a live kernel image
    that can never crash, hang or interfere with the kernel negatively.
    (Right now it's limited to root-only, but in the future we might
    allow unprivileged use as well.)

    (Alexei Starovoitov)

    - Another non-trivial feature is per event clockid support: this
    allows, amongst other things, the selection of different clock
    sources for event timestamps traced via perf.

    This feature is sought by people who'd like to merge perf generated
    events with external events that were measured with different
    clocks:

    - cluster wide profiling

    - for system wide tracing with user-space events,

    - JIT profiling events

    etc. Matching perf tooling support is added as well, available via
    the -k, --clockid parameter to perf record et al.

    (Peter Zijlstra)

    Hardware enablement kernel changes:

    - x86 Intel Processor Trace (PT) support: which is a hardware tracer
    on steroids, available on Broadwell CPUs.

    The hardware trace stream is directly output into the user-space
    ring-buffer, using the 'AUX' data format extension that was added
    to the perf core to support hardware constraints such as the
    necessity to have the tracing buffer physically contiguous.

    This patch-set was developed for two years and this is the result.
    A simple way to make use of this is to use BTS tracing, the PT
    driver emulates BTS output - available via the 'intel_bts' PMU.
    More explicit PT specific tooling support is in the works as well -
    will probably be ready by 4.2.

    (Alexander Shishkin, Peter Zijlstra)

    - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware
    feature of Intel Xeon CPUs that allows the measurement and
    allocation/partitioning of caches to individual workloads.

    These kernel changes expose the measurement side as a new PMU
    driver, which exposes various QoS related PMU events. (The
    partitioning change is work in progress and is planned to be merged
    as a cgroup extension.)

    (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P
    Waskiewicz Jr)

    - x86 Intel Haswell LBR call stack support: this is a new Haswell
    feature that allows the hardware recording of call chains, plus
    tooling support. To activate this feature you have to enable it
    via the new 'lbr' call-graph recording option:

    perf record --call-graph lbr
    perf report

    or:

    perf top --call-graph lbr

    This hardware feature is a lot faster than stack walk or dwarf
    based unwinding, but has some limitations:

    - It reuses the current LBR facility, so LBR call stack and
    branch record can not be enabled at the same time.

    - It is only available for user-space callchains.

    (Yan, Zheng)

    - x86 Intel Broadwell CPU support and various event constraints and
    event table fixes for earlier models.

    (Andi Kleen)

    - x86 Intel HT CPUs event scheduling workarounds. This is a complex
    CPU bug affecting the SNB,IVB,HSW families that results in counter
    value corruption. The mitigation code is automatically enabled and
    is transparent.

    (Maria Dimakopoulou, Stephane Eranian)

    The perf tooling side had a ton of changes in this cycle as well, so
    I'm only able to list the user visible changes here, in addition to
    the tooling changes outlined above:

    User visible changes affecting all tools:

    - Improve support of compressed kernel modules (Jiri Olsa)
    - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo)
    - Bash completion for subcommands (Yunlong Song)
    - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa)
    - Support missing -f to override perf.data file ownership. (Yunlong Song)
    - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)

    User visible changes in individual tools:

    'perf data':

    New tool for converting perf.data to other formats, initially
    for the CTF (Common Trace Format) from LTTng (Jiri Olsa,
    Sebastian Siewior)

    'perf diff':

    Add --kallsyms option (David Ahern)

    'perf list':

    Allow listing events with 'tracepoint' prefix (Yunlong Song)

    Sort the output of the command (Yunlong Song)

    'perf kmem':

    Respect -i option (Jiri Olsa)

    Print big numbers using thousands' group (Namhyung Kim)

    Allow -v option (Namhyung Kim)

    Fix alignment of slab result table (Namhyung Kim)

    'perf probe':

    Support multiple probes on different binaries on the same command line (Masami Hiramatsu)

    Support unnamed union/structure members data collection. (Masami Hiramatsu)

    Check kprobes blacklist when adding new events. (Masami Hiramatsu)

    'perf record':

    Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra)

    Support recording running/enabled time (Andi Kleen)

    'perf sched':

    Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song)

    'perf report' and 'perf top':

    Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo)

    Indicate which callchain entries are annotated in the
    TUI hists browser (Arnaldo Carvalho de Melo)

    Add pid/tid filtering to 'report' and 'script' commands (David Ahern)

    Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one
    cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT
    events (Arnaldo Carvalho de Melo)

    'perf stat':

    Report unsupported events properly (Suzuki K. Poulose)

    Output running time and run/enabled ratio in CSV mode (Andi Kleen)

    'perf trace':

    Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo)

    Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo)

    Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo)

    Dump stack on segfaults (Arnaldo Carvalho de Melo)

    No need to explicitely enable evsels for workload started from perf, let it
    be enabled via perf_event_attr.enable_on_exec, removing some events that take
    place in the 'perf trace' before a workload is really started by it.
    (Arnaldo Carvalho de Melo)

    Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo)

    There's also been a ton of infrastructure work done, such as the
    split-out of perf's build system into tools/build/ and other changes -
    see the shortlog and changelog for details"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits)
    perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init()
    perf evlist: Fix type for references to data_head/tail
    perf probe: Check the orphaned -x option
    perf probe: Support multiple probes on different binaries
    perf buildid-list: Fix segfault when show DSOs with hits
    perf tools: Fix cross-endian analysis
    perf tools: Fix error path to do closedir() when synthesizing threads
    perf tools: Fix synthesizing fork_event.ppid for non-main thread
    perf tools: Add 'I' event modifier for exclude_idle bit
    perf report: Don't call map__kmap if map is NULL.
    perf tests: Fix attr tests
    perf probe: Fix ARM 32 building error
    perf tools: Merge all perf_event_attr print functions
    perf record: Add clockid parameter
    perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10
    perf sched replay: Support using -f to override perf.data file ownership
    perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files
    perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task
    perf sched replay: Fix the segmentation fault problem caused by pr_err in threads
    perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations
    ...

    Linus Torvalds
     
  • Pull tracing updates from Steven Rostedt:
    "Some clean ups and small fixes, but the biggest change is the addition
    of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints.

    Tracepoints have helper functions for the TP_printk() called
    __print_symbolic() and __print_flags() that lets a numeric number be
    displayed as a a human comprehensible text. What is placed in the
    TP_printk() is also shown in the tracepoint format file such that user
    space tools like perf and trace-cmd can parse the binary data and
    express the values too. Unfortunately, the way the TRACE_EVENT()
    macro works, anything placed in the TP_printk() will be shown pretty
    much exactly as is. The problem arises when enums are used. That's
    because unlike macros, enums will not be changed into their values by
    the C pre-processor. Thus, the enum string is exported to the format
    file, and this makes it useless for user space tools.

    The TRACE_DEFINE_ENUM() solves this by converting the enum strings in
    the TP_printk() format into their number, and that is what is shown to
    user space. For example, the tracepoint tlb_flush currently has this
    in its format file:

    __print_symbolic(REC->reason,
    { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
    { TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
    { TLB_LOCAL_SHOOTDOWN, "local shootdown" },
    { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })

    After adding:

    TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
    TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
    TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
    TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);

    Its format file will contain this:

    __print_symbolic(REC->reason,
    { 0, "flush on task switch" },
    { 1, "remote shootdown" },
    { 2, "local shootdown" },
    { 3, "local mm shootdown" })"

    * tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits)
    tracing: Add enum_map file to show enums that have been mapped
    writeback: Export enums used by tracepoint to user space
    v4l: Export enums used by tracepoints to user space
    SUNRPC: Export enums in tracepoints to user space
    mm: tracing: Export enums in tracepoints to user space
    irq/tracing: Export enums in tracepoints to user space
    f2fs: Export the enums in the tracepoints to userspace
    net/9p/tracing: Export enums in tracepoints to userspace
    x86/tlb/trace: Export enums in used by tlb_flush tracepoint
    tracing/samples: Update the trace-event-sample.h with TRACE_DEFINE_ENUM()
    tracing: Allow for modules to convert their enums to values
    tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values
    tracing: Update trace-event-sample with TRACE_SYSTEM_VAR documentation
    tracing: Give system name a pointer
    brcmsmac: Move each system tracepoints to their own header
    iwlwifi: Move each system tracepoints to their own header
    mac80211: Move message tracepoints to their own header
    tracing: Add TRACE_SYSTEM_VAR to xhci-hcd
    tracing: Add TRACE_SYSTEM_VAR to kvm-s390
    tracing: Add TRACE_SYSTEM_VAR to intel-sst
    ...

    Linus Torvalds
     
  • Pull tracefs from Steven Rostedt:
    "This adds the new tracefs file system.

    This has been in linux-next for more than one release, as I had it
    ready for the 4.0 merge window, but a last minute thing that needed to
    go into Linux first had to be done. That was that perf hard coded the
    file system number when reading /sys/kernel/debugfs/tracing directory
    making sure that the path had the debugfs mount # before it would
    parse the tracing file. This broke other use cases of perf, and the
    check is removed.

    Now when mounting /sys/kernel/debug, tracefs is automatically mounted
    in /sys/kernel/debug/tracing such that old tools will still see that
    path as expected. But now system admins can mount tracefs directly
    and not need to mount debugfs, which can expose security issues. A
    new directory is created when tracefs is configured such that system
    admins can now mount it separately (/sys/kernel/tracing)"

    * tag 'trace-4.1-tracefs' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Have mkdir and rmdir be part of tracefs
    tracefs: Add directory /sys/kernel/tracing
    tracing: Automatically mount tracefs on debugfs/tracing
    tracing: Convert the tracing facility over to use tracefs
    tracefs: Add new tracefs file system
    tracing: Create cmdline tracer options on tracing fs init
    tracing: Only create tracer options files if directory exists
    debugfs: Provide a file creation function that also takes an initial size

    Linus Torvalds
     

08 Apr, 2015

3 commits

  • Add a enum_map file in the tracing directory to see what enums have been
    saved to convert in the print fmt files.

    As this requires the enum mapping to be persistent in memory, it is only
    created if the new config option CONFIG_TRACE_ENUM_MAP_FILE is enabled.
    This is for debugging and will increase the persistent memory footprint
    of the kernel.

    Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org

    Reviewed-by: Masami Hiramatsu
    Tested-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Update the infrastructure such that modules that declare TRACE_DEFINE_ENUM()
    will have those enums converted into their values in the tracepoint
    print fmt strings.

    Link: http://lkml.kernel.org/r/87vbhjp74q.fsf@rustcorp.com.au

    Acked-by: Rusty Russell
    Reviewed-by: Masami Hiramatsu
    Tested-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Several tracepoints use the helper functions __print_symbolic() or
    __print_flags() and pass in enums that do the mapping between the
    binary data stored and the value to print. This works well for reading
    the ASCII trace files, but when the data is read via userspace tools
    such as perf and trace-cmd, the conversion of the binary value to a
    human string format is lost if an enum is used, as userspace does not
    have access to what the ENUM is.

    For example, the tracepoint trace_tlb_flush() has:

    __print_symbolic(REC->reason,
    { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
    { TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
    { TLB_LOCAL_SHOOTDOWN, "local shootdown" },
    { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })

    Which maps the enum values to the strings they represent. But perf and
    trace-cmd do no know what value TLB_LOCAL_MM_SHOOTDOWN is, and would
    not be able to map it.

    With TRACE_DEFINE_ENUM(), developers can place these in the event header
    files and ftrace will convert the enums to their values:

    By adding:

    TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
    TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
    TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
    TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);

    $ cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/format
    [...]
    __print_symbolic(REC->reason,
    { 0, "flush on task switch" },
    { 1, "remote shootdown" },
    { 2, "local shootdown" },
    { 3, "local mm shootdown" })

    The above is what userspace expects to see, and tools do not need to
    be modified to parse them.

    Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org

    Cc: Guilherme Cox
    Cc: Tony Luck
    Cc: Xie XiuQi
    Acked-by: Namhyung Kim
    Reviewed-by: Masami Hiramatsu
    Tested-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

03 Apr, 2015

1 commit

  • Dynamically allocated trampolines call ftrace_ops_get_func to get the
    function which they should call. For dynamic fops (FTRACE_OPS_FL_DYNAMIC
    flag is set) ftrace_ops_list_func is always returned. This is reasonable
    for static trampolines but goes against the main advantage of dynamic
    ones, that is avoidance of going through the list of all registered
    callbacks for functions that are only being traced by a single callback.

    We can fix it by returning ops->func (or recursion safe version) from
    ftrace_ops_get_func whenever it is possible for dynamic trampolines.

    Note that dynamic trampolines are not allowed for dynamic fops if
    CONFIG_PREEMPT=y.

    Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1501291023000.25445@pobox.suse.cz
    Link: http://lkml.kernel.org/r/1424357773-13536-1-git-send-email-mbenes@suse.cz

    Reported-by: Miroslav Benes
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

02 Apr, 2015

5 commits

  • So bpf_tracing.o depends on CONFIG_BPF_SYSCALL - but that's not its only
    dependency, it also depends on the tracing infrastructure and on kprobes,
    without which it will fail to build with:

    In file included from kernel/trace/bpf_trace.c:14:0:
    kernel/trace/trace.h: In function ‘trace_test_and_set_recursion’:
    kernel/trace/trace.h:491:28: error: ‘struct task_struct’ has no member named ‘trace_recursion’
    unsigned int val = current->trace_recursion;
    [...]

    It took quite some time to trigger this build failure, because right now
    BPF_SYSCALL is very obscure, depends on CONFIG_EXPERT. So also make BPF_SYSCALL
    more configurable, not just under CONFIG_EXPERT.

    If BPF_SYSCALL, tracing and kprobes are enabled then enable the bpf_tracing
    gateway as well.

    We might want to make this an interactive option later on, although
    I'd not complicate it unnecessarily: enabling BPF_SYSCALL is enough of
    an indicator that the user wants BPF support.

    Cc: Alexei Starovoitov
    Cc: Andrew Morton
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Daniel Borkmann
    Cc: David S. Miller
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Debugging of BPF programs needs some form of printk from the
    program, so let programs call limited trace_printk() with %d %u
    %x %p modifiers only.

    Similar to kernel modules, during program load verifier checks
    whether program is calling bpf_trace_printk() and if so, kernel
    allocates trace_printk buffers and emits big 'this is debug
    only' banner.

    Signed-off-by: Alexei Starovoitov
    Reviewed-by: Steven Rostedt
    Cc: Andrew Morton
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Daniel Borkmann
    Cc: David S. Miller
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1427312966-8434-6-git-send-email-ast@plumgrid.com
    Signed-off-by: Ingo Molnar

    Alexei Starovoitov
     
  • bpf_ktime_get_ns() is used by programs to compute time delta
    between events or as a timestamp

    Signed-off-by: Alexei Starovoitov
    Reviewed-by: Steven Rostedt
    Cc: Andrew Morton
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Daniel Borkmann
    Cc: David S. Miller
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1427312966-8434-5-git-send-email-ast@plumgrid.com
    Signed-off-by: Ingo Molnar

    Alexei Starovoitov
     
  • BPF programs, attached to kprobes, provide a safe way to execute
    user-defined BPF byte-code programs without being able to crash or
    hang the kernel in any way. The BPF engine makes sure that such
    programs have a finite execution time and that they cannot break
    out of their sandbox.

    The user interface is to attach to a kprobe via the perf syscall:

    struct perf_event_attr attr = {
    .type = PERF_TYPE_TRACEPOINT,
    .config = event_id,
    ...
    };

    event_fd = perf_event_open(&attr,...);
    ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);

    'prog_fd' is a file descriptor associated with BPF program
    previously loaded.

    'event_id' is an ID of the kprobe created.

    Closing 'event_fd':

    close(event_fd);

    ... automatically detaches BPF program from it.

    BPF programs can call in-kernel helper functions to:

    - lookup/update/delete elements in maps

    - probe_read - wraper of probe_kernel_read() used to access any
    kernel data structures

    BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is
    architecture dependent) and return 0 to ignore the event and 1 to store
    kprobe event into the ring buffer.

    Note, kprobes are a fundamentally _not_ a stable kernel ABI,
    so BPF programs attached to kprobes must be recompiled for
    every kernel version and user must supply correct LINUX_VERSION_CODE
    in attr.kern_version during bpf_prog_load() call.

    Signed-off-by: Alexei Starovoitov
    Reviewed-by: Steven Rostedt
    Reviewed-by: Masami Hiramatsu
    Cc: Andrew Morton
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Daniel Borkmann
    Cc: David S. Miller
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.com
    Signed-off-by: Ingo Molnar

    Alexei Starovoitov
     
  • add TRACE_EVENT_FL_KPROBE flag to differentiate kprobe type of
    tracepoints, since bpf programs can only be attached to kprobe
    type of PERF_TYPE_TRACEPOINT perf events.

    Signed-off-by: Alexei Starovoitov
    Reviewed-by: Steven Rostedt
    Reviewed-by: Masami Hiramatsu
    Cc: Andrew Morton
    Cc: Arnaldo Carvalho de Melo
    Cc: Arnaldo Carvalho de Melo
    Cc: Daniel Borkmann
    Cc: David S. Miller
    Cc: Jiri Olsa
    Cc: Linus Torvalds
    Cc: Namhyung Kim
    Cc: Peter Zijlstra
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1427312966-8434-3-git-send-email-ast@plumgrid.com
    Signed-off-by: Ingo Molnar

    Alexei Starovoitov
     

31 Mar, 2015

1 commit

  • A clean up of the recursive protection code changed

    val = this_cpu_read(current_context);
    val--;
    val &= this_cpu_read(current_context);

    to

    val = this_cpu_read(current_context);
    val &= val & (val - 1);

    Which has a duplicate use of '&' as the above is the same as

    val = val & (val - 1);

    Actually, it would be best to remove that line altogether and
    just add it to where it is used.

    And Christoph even mentioned that it can be further compacted to
    just a single line:

    __this_cpu_and(current_context, __this_cpu_read(current_context) - 1);

    Link: http://lkml.kernel.org/alpine.DEB.2.11.1503271423580.23114@gentwo.org

    Suggested-by: Christoph Lameter
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

27 Mar, 2015

1 commit


25 Mar, 2015

4 commits

  • The commit that added a check for this to checkpatch says:

    "Using weak declarations can have unintended link defects. The __weak on
    the declaration causes non-weak definitions to become weak."

    In this case, when a PowerPC kernel is built with CONFIG_KPROBE_EVENT
    but not CONFIG_UPROBE_EVENT, it generates the following warning:

    WARNING: 1 bad relocations
    c0000000014f2190 R_PPC64_ADDR64 uprobes_fetch_type_table

    This is fixed by passing the fetch_table arrays to
    traceprobe_parse_probe_arg() which also means that they can never be NULL.

    Link: http://lkml.kernel.org/r/20150312165834.4482cb48@canb.auug.org.au

    Acked-by: Masami Hiramatsu
    Signed-off-by: Stephen Rothwell
    Signed-off-by: Steven Rostedt

    Stephen Rothwell
     
  • TRACE_EVENT_FL_USE_CALL_FILTER flag in ftrace:functon event can be
    removed. This flag was first introduced in commit
    f306cc82a93d ("tracing: Update event filters for multibuffer").

    Now, the only place uses this flag is ftrace:function, but the filter of
    ftrace:function has a different code path with events/syscalls and
    events/tracepoints. It uses ftrace_filter_write() and perf's
    ftrace_profile_set_filter() to set the filter, the functionality of file
    'tracing/events/ftrace/function/filter' is bypassed in function
    init_pred(), in which case, neither call->filter nor file->filter is
    used.

    So we can safely remove TRACE_EVENT_FL_USE_CALL_FILTER flag from
    ftrace:function events.

    Link: http://lkml.kernel.org/r/1425367294-27852-1-git-send-email-hekuang@huawei.com

    Signed-off-by: He Kuang
    Signed-off-by: Steven Rostedt

    He Kuang
     
  • Use %pS for actual addresses, otherwise you'll get bad output
    on arches like ppc64 where %pF expects a function descriptor.

    Link: http://lkml.kernel.org/r/1426130037-17956-22-git-send-email-scottwood@freescale.com

    Signed-off-by: Scott Wood
    Signed-off-by: Steven Rostedt

    Scott Wood
     
  • It has come to my attention that this_cpu_read/write are horrible on
    architectures other than x86. Worse yet, they actually disable
    preemption or interrupts! This caused some unexpected tracing results
    on ARM.

    101.356868: preempt_count_add
    Reported-by: Uwe Kleine-Koenig
    Tested-by: Uwe Kleine-Koenig
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

23 Mar, 2015

1 commit

  • The only reason CQM had to use a hard-coded pmu type was so it could use
    cqm_target in hw_perf_event.

    Do away with the {tp,bp,cqm}_target pointers and provide a non type
    specific one.

    This allows us to do away with that silly pmu type as well.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Vince Weaver
    Cc: acme@kernel.org
    Cc: acme@redhat.com
    Cc: hpa@zytor.com
    Cc: jolsa@redhat.com
    Cc: kanaka.d.juvva@intel.com
    Cc: matt.fleming@intel.com
    Cc: tglx@linutronix.de
    Cc: torvalds@linux-foundation.org
    Cc: vikas.shivappa@linux.intel.com
    Link: http://lkml.kernel.org/r/20150305211019.GU21418@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

09 Mar, 2015

1 commit

  • Some archs (specifically PowerPC), are sensitive with the ordering of
    the enabling of the calls to function tracing and setting of the
    function to use to be traced.

    That is, update_ftrace_function() sets what function the ftrace_caller
    trampoline should call. Some archs require this to be set before
    calling ftrace_run_update_code().

    Another bug was discovered, that ftrace_startup_sysctl() called
    ftrace_run_update_code() directly. If the function the ftrace_caller
    trampoline changes, then it will not be updated. Instead a call
    to ftrace_startup_enable() should be called because it tests to see
    if the callback changed since the code was disabled, and will
    tell the arch to update appropriately. Most archs do not need this
    notification, but PowerPC does.

    The problem could be seen by the following commands:

    # echo 0 > /proc/sys/kernel/ftrace_enabled
    # echo function > /sys/kernel/debug/tracing/current_tracer
    # echo 1 > /proc/sys/kernel/ftrace_enabled
    # cat /sys/kernel/debug/tracing/trace

    The trace will show that function tracing was not active.

    Cc: stable@vger.kernel.org # 2.6.27+
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)