22 Jan, 2020

6 commits


16 Jan, 2020

1 commit

  • trace_printk() is used to debug the kernel which includes the tracing
    infrastructure. But because it writes to the ring buffer, and so does much
    of the tracing infrastructure, the ring buffer's recursive detection will
    drop writes to the ring buffer that is in the same context as the current
    write is happening (it allows interrupts to write when normal context is
    writing, but wont let normal context write while normal context is writing).

    This can cause confusion and think that the code is where the trace_printk()
    exists is not hit. To solve this, up the recursive nesting of the ring
    buffer when trace_printk() is called before it writes to the buffer itself.

    Note, this does make it dangerous to use trace_printk() in the ring buffer
    code itself, because this basically disables the recursion protection of
    trace_printk() buffer writes. But as trace_printk() is only used for
    debugging, and if this does occur, the developer will see the cause real
    quick (recursive blowing up of the stack). Thus the developer can deal with
    that. But having trace_printk() silently ignored is a much bigger problem,
    and disabling recursive protection is a small price to pay to fix it.

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

15 Jan, 2020

2 commits


14 Jan, 2020

26 commits

  • With CONFIG_PROVE_RCU_LIST, I had many suspicious RCU warnings
    when I ran ftracetest trigger testcases.

    -----
    # dmesg -c > /dev/null
    # ./ftracetest test.d/trigger
    ...
    # dmesg | grep "RCU-list traversed" | cut -f 2 -d ] | cut -f 2 -d " "
    kernel/trace/trace_events_hist.c:6070
    kernel/trace/trace_events_hist.c:1760
    kernel/trace/trace_events_hist.c:5911
    kernel/trace/trace_events_trigger.c:504
    kernel/trace/trace_events_hist.c:1810
    kernel/trace/trace_events_hist.c:3158
    kernel/trace/trace_events_hist.c:3105
    kernel/trace/trace_events_hist.c:5518
    kernel/trace/trace_events_hist.c:5998
    kernel/trace/trace_events_hist.c:6019
    kernel/trace/trace_events_hist.c:6044
    kernel/trace/trace_events_trigger.c:1500
    kernel/trace/trace_events_trigger.c:1540
    kernel/trace/trace_events_trigger.c:539
    kernel/trace/trace_events_trigger.c:584
    -----

    I investigated those warnings and found that the RCU-list
    traversals in event trigger and hist didn't need to use
    RCU version because those were called only under event_mutex.

    I also checked other RCU-list traversals related to event
    trigger list, and found that most of them were called from
    event_hist_trigger_func() or hist_unregister_trigger() or
    register/unregister functions except for a few cases.

    Replace these unneeded RCU-list traversals with normal list
    traversal macro and lockdep_assert_held() to check the
    event_mutex is held.

    Link: http://lkml.kernel.org/r/157680910305.11685.15110237954275915782.stgit@devnote2

    Reviewed-by: Tom Zanussi
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Add a documentation about boot-time tracing options in
    boot config.

    Link: http://lkml.kernel.org/r/157867246028.17873.8047384554383977870.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Add below function-tracer filter options to boot-time tracing.

    - ftrace.[instance.INSTANCE.]ftrace.filters
    This will take an array of tracing function filter rules

    - ftrace.[instance.INSTANCE.]ftrace.notraces
    This will take an array of NON-tracing function filter rules

    Link: http://lkml.kernel.org/r/157867244841.17873.10933616628243103561.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Add ftrace.cpumask option support to boot-time tracing.
    This sets cpumask for each instance.

    - ftrace.[instance.INSTANCE.]cpumask = CPUMASK;
    Set the trace cpumask. Note that the CPUMASK should be a string
    which /tracing_cpumask can accepts.

    Link: http://lkml.kernel.org/r/157867243625.17873.13613922641273149372.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Add instance node support to boot-time tracing. User can set
    some options and event nodes under instance node.

    - ftrace.instance.INSTANCE[...]
    Add new INSTANCE instance. Some options and event nodes
    are acceptable for instance node.

    Link: http://lkml.kernel.org/r/157867242413.17873.9814204526141500278.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Add synthetic event node support to boot time tracing.
    The synthetic event is a kind of event node, but the group
    name is "synthetic".

    - ftrace.event.synthetic.EVENT.fields = FIELD[, FIELD2...]
    Defines new synthetic event with FIELDs. Each field should be
    "type varname".

    The synthetic node requires "fields" string arraies, which defines
    the fields as same as tracing/synth_events interface.

    Link: http://lkml.kernel.org/r/157867241236.17873.12411615143321557709.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Add kprobe event support on event node to boot-time tracing.
    If the group name of event is "kprobes", the boot-time tracing
    defines new probe event according to "probes" values.

    - ftrace.event.kprobes.EVENT.probes = PROBE[, PROBE2...]
    Defines new kprobe event based on PROBEs. It is able to define
    multiple probes on one event, but those must have same type of
    arguments.

    For example,

    ftrace.events.kprobes.myevent {
    probes = "vfs_read $arg1 $arg2";
    enable;
    }

    This will add kprobes:myevent on vfs_read with the 1st and the 2nd
    arguments.

    Link: http://lkml.kernel.org/r/157867240104.17873.9712052065426433111.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Add per-event settings for boottime tracing. User can set filter,
    actions and enable on each event on boot. The event entries are
    under ftrace.event.GROUP.EVENT node (note that the option key
    includes event's group name and event name.) This supports below
    configs.

    - ftrace.event.GROUP.EVENT.enable
    Enables GROUP:EVENT tracing.

    - ftrace.event.GROUP.EVENT.filter = FILTER
    Set FILTER rule to the GROUP:EVENT.

    - ftrace.event.GROUP.EVENT.actions = ACTION[, ACTION2...]
    Set ACTIONs to the GROUP:EVENT.

    For example,

    ftrace.event.sched.sched_process_exec {
    filter = "pid < 128"
    enable
    }

    this will enable tracing "sched:sched_process_exec" event
    with "pid < 128" filter.

    Link: http://lkml.kernel.org/r/157867238942.17873.11177628789184546198.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Setup tracing options via extra boot config in addition to kernel
    command line.

    This adds following commands support. These are applied to
    the global trace instance.

    - ftrace.options = OPT1[,OPT2...]
    Enable given ftrace options.

    - ftrace.trace_clock = CLOCK
    Set given CLOCK to ftrace's trace_clock.

    - ftrace.buffer_size = SIZE
    Configure ftrace buffer size to SIZE. You can use "KB" or "MB"
    for that SIZE.

    - ftrace.events = EVENT[, EVENT2...]
    Enable given events on boot. You can use a wild card in EVENT.

    - ftrace.tracer = TRACER
    Set TRACER to current tracer on boot. (e.g. function)

    Note that this is NOT replacing the kernel parameters, because
    this boot config based setting is later than that. If you want to
    trace earlier boot events, you still need kernel parameters.

    Link: http://lkml.kernel.org/r/157867237723.17873.17494943526320587488.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Add NULL trace-array check in print_synth_event(), because
    if we enable tp_printk option, iter->tr can be NULL.

    Link: http://lkml.kernel.org/r/157867236536.17873.12529350542460184019.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Make the synthetic event accepts a different type field to record.
    However, the size and signed flag must be same.

    Link: http://lkml.kernel.org/r/157867235358.17873.61732996461602171.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Register kprobe event to dynevent in subsys_initcall level.
    This will allow kernel to register new kprobe events in
    fs_initcall level via trace_run_command.

    Link: http://lkml.kernel.org/r/157867234213.17873.18039000024374948737.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Since kprobe-events use event_trigger_unlock_commit_regs() directly,
    that events doesn't show up in printk buffer if "tp_printk" is set.

    Use trace_event_buffer_commit() in kprobe events so that it can
    invoke output_printk() as same as other trace events.

    Link: http://lkml.kernel.org/r/157867233085.17873.5210928676787339604.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    [ Adjusted data var declaration placement in __kretprobe_trace_func() ]
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Apply soft-disabled and the filter rule of the trace events to
    the printk output of tracepoints (a.k.a. tp_printk kernel parameter)
    as same as trace buffer output.

    Link: http://lkml.kernel.org/r/157867231876.17873.15825819592284704068.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Add a documentation for extended boot config under
    admin-guide, since it is including the syntax of boot config.

    Link: http://lkml.kernel.org/r/157867230658.17873.9309879174829924324.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Since the current kernel command line is too short to describe
    long and many options for init (e.g. systemd command line options),
    this allows admin to use boot config for init command line.

    All init command line under "init." keywords will be passed to
    init.

    For example,

    init.systemd {
    unified_cgroup_hierarchy = 1
    debug_shell
    default_timeout_start_sec = 60
    }

    Link: http://lkml.kernel.org/r/157867229521.17873.654222294326542349.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Since the current kernel command line is too short to describe
    many options which supported by kernel, allow user to use boot
    config to setup (add) the command line options.

    All kernel parameters under "kernel." keywords will be used
    for setting up extra kernel command line.

    For example,

    kernel {
    audit = on
    audit_backlog_limit = 256
    }

    Note that you can not specify some early parameters
    (like console etc.) by this method, since it is
    loaded after early parameters parsed.

    Link: http://lkml.kernel.org/r/157867228333.17873.11962796367032622466.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Since initcall_command_line is used as a temporary buffer,
    it could be freed after usage. Allocate it in do_initcall()
    and free it after used.

    Link: http://lkml.kernel.org/r/157867227145.17873.17513760552008505454.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Add /proc/bootconfig which shows the list of key-value pairs
    in boot config. Since after boot, all boot configs and tree
    are removed, this interface just keep a copy of key-value
    pairs in text.

    Link: http://lkml.kernel.org/r/157867225967.17873.12155805787236073787.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Add a bootconfig test script to ensure the tool and
    boot config parser are working correctly.

    Link: http://lkml.kernel.org/r/157867224728.17873.18114241801246589416.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Add "bootconfig" command which operates the bootconfig
    config-data on initrd image.

    User can add/delete/verify the boot config on initrd
    image using this command.

    e.g.
    Add a boot config to initrd image
    # bootconfig -a myboot.conf /boot/initrd.img

    Remove it.
    # bootconfig -d /boot/initrd.img

    Or verify (and show) it.
    # bootconfig /boot/initrd.img

    Link: http://lkml.kernel.org/r/157867223582.17873.14342161849213219982.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    [ Removed extra blank line at end of bootconfig.c ]
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Load the extended boot config data from the tail of initrd
    image. If there is an SKC data there, it has
    [(u32)size][(u32)checksum] header (in really, this is a
    footer) at the end of initrd. If the checksum (simple sum
    of bytes) is match, this starts parsing it from there.

    Link: http://lkml.kernel.org/r/157867222435.17873.9936667353335606867.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Extra Boot Config (XBC) allows admin to pass a tree-structured
    boot configuration file when boot up the kernel. This extends
    the kernel command line in an efficient way.

    Boot config will contain some key-value commands, e.g.

    key.word = value1
    another.key.word = value2

    It can fold same keys with braces, also you can write array
    data. For example,

    key {
    word1 {
    setting1 = data
    setting2
    }
    word2.array = "val1", "val2"
    }

    User can access these key-value pair and tree structure via
    SKC APIs.

    Link: http://lkml.kernel.org/r/157867221257.17873.1775090991929862549.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • As there's two struct ring_buffers in the kernel, it causes some confusion.
    The other one being the perf ring buffer. It was agreed upon that as neither
    of the ring buffers are generic enough to be used globally, they should be
    renamed as:

    perf's ring_buffer -> perf_buffer
    ftrace's ring_buffer -> trace_buffer

    This implements the changes to the ring buffer that ftrace uses.

    Link: https://lore.kernel.org/r/20191213140531.116b3200@gandalf.local.home

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • As we are working to remove the generic "ring_buffer" name that is used by
    both tracing and perf, the ring_buffer name for tracing will be renamed to
    trace_buffer, and perf's ring buffer will be renamed to perf_buffer.

    As there already exists a trace_buffer that is used by the trace_arrays, it
    needs to be first renamed to array_buffer.

    Link: https://lore.kernel.org/r/20191213153553.GE20583@krava

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • eBPF requires needing to know the size of the perf ring buffer structure.
    But it unfortunately has the same name as the generic ring buffer used by
    tracing and oprofile. To make it less ambiguous, rename the perf ring buffer
    structure to "perf_buffer".

    As other parts of the ring buffer code has "perf_" as the prefix, it only
    makes sense to give the ring buffer the "perf_" prefix as well.

    Link: https://lore.kernel.org/r/20191213153553.GE20583@krava
    Acked-by: Peter Zijlstra
    Suggested-by: Alexei Starovoitov
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

13 Jan, 2020

5 commits

  • Linus Torvalds
     
  • Pull RISC-V fixes from Paul Walmsley:
    "Two fixes for RISC-V:

    - Clear FP registers during boot when FP support is present, rather
    than when they aren't present

    - Move the header files associated with the SiFive L2 cache
    controller to drivers/soc (where the code was recently moved)"

    * tag 'riscv/for-v5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
    riscv: Fixup obvious bug for fp-regs reset
    riscv: move sifive_l2_cache.h to include/soc

    Linus Torvalds
     
  • CSR_MISA is defined in Privileged Architectures' spec: 3.1.1 Machine
    ISA Register misa. Every bit:1 indicate a feature, so we should beqz
    reset_done when there is no F/D bit in csr_misa register.

    Signed-off-by: Guo Ren
    [paul.walmsley@sifive.com: fix typo in commit message]
    Fixes: 9e80635619b51 ("riscv: clear the instruction cache and all registers when booting")
    Signed-off-by: Paul Walmsley

    Guo Ren
     
  • The commit 9209fb51896f ("riscv: move sifive_l2_cache.c to drivers/soc")
    moves the sifive L2 cache driver to driver/soc. It did not move the
    header file along with the driver. Therefore this patch moves the header
    file to driver/soc

    Signed-off-by: Yash Shah
    Reviewed-by: Anup Patel
    [paul.walmsley@sifive.com: updated to fix the include guard]
    Fixes: 9209fb51896f ("riscv: move sifive_l2_cache.c to drivers/soc")
    Signed-off-by: Paul Walmsley

    Yash Shah
     
  • Pull iommu fixes from Joerg Roedel:

    - Two fixes for VT-d and generic IOMMU code to fix teardown on error
    handling code paths.

    - Patch for the Intel VT-d driver to fix handling of non-PCI devices

    - Fix W=1 compile warning in dma-iommu code

    * tag 'iommu-fixes-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
    iommu/dma: fix variable 'cookie' set but not used
    iommu/vt-d: Unlink device if failed to add to group
    iommu: Remove device link to group on failure
    iommu/vt-d: Fix adding non-PCI devices to Intel IOMMU

    Linus Torvalds