20 Jan, 2021

1 commit

  • commit 7bb83f6fc4ee84e95d0ac0d14452c2619fb3fe70 upstream.

    Enable the notrace function check on the architecture which doesn't
    support kprobes on ftrace but support dynamic ftrace. This notrace
    function check is not only for the kprobes on ftrace but also
    sw-breakpoint based kprobes.
    Thus there is no reason to limit this check for the arch which
    supports kprobes on ftrace.

    This also changes the dependency of Kconfig. Because kprobe event
    uses the function tracer's address list for identifying notrace
    function, if the CONFIG_DYNAMIC_FTRACE=n, it can not check whether
    the target function is notrace or not.

    Link: https://lkml.kernel.org/r/20210105065730.2634785-1-naveen.n.rao@linux.vnet.ibm.com
    Link: https://lkml.kernel.org/r/161007957862.114704.4512260007555399463.stgit@devnote2

    Cc: stable@vger.kernel.org
    Fixes: 45408c4f92506 ("tracing: kprobes: Prohibit probing on notrace function")
    Acked-by: Naveen N. Rao
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Masami Hiramatsu
     

30 Dec, 2020

3 commits

  • commit adab66b71abfe206a020f11e561f4df41f0b2aba upstream.

    It was believed that metag was the only architecture that required the ring
    buffer to keep 8 byte words aligned on 8 byte architectures, and with its
    removal, it was assumed that the ring buffer code did not need to handle
    this case. It appears that sparc64 also requires this.

    The following was reported on a sparc64 boot up:

    kernel: futex hash table entries: 65536 (order: 9, 4194304 bytes, linear)
    kernel: Running postponed tracer tests:
    kernel: Testing tracer function:
    kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140
    kernel: Kernel unaligned access at TPC[552a24] trace_function+0x44/0x140
    kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140
    kernel: Kernel unaligned access at TPC[552a24] trace_function+0x44/0x140
    kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140
    kernel: PASSED

    Need to put back the 64BIT aligned code for the ring buffer.

    Link: https://lore.kernel.org/r/CADxRZqzXQRYgKc=y-KV=S_yHL+Y8Ay2mh5ezeZUnpRvg+syWKw@mail.gmail.com

    Cc: stable@vger.kernel.org
    Fixes: 86b3de60a0b6 ("ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS")
    Reported-by: Anatoly Pugachev
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     
  • commit 60efe21e5976d3d4170a8190ca76a271d6419754 upstream.

    Disable ftrace selftests when any tracer (kernel command line options
    like ftrace=, trace_events=, kprobe_events=, and boot-time tracing)
    starts running because selftest can disturb it.

    Currently ftrace= and trace_events= are checked, but kprobe_events
    has a different flag, and boot-time tracing didn't checked. This unifies
    the disabled flag and all of those boot-time tracing features sets
    the flag.

    This also fixes warnings on kprobe-event selftest
    (CONFIG_FTRACE_STARTUP_TEST=y and CONFIG_KPROBE_EVENTS=y) with boot-time
    tracing (ftrace.event.kprobes.EVENT.probes) like below;

    [ 59.803496] trace_kprobe: Testing kprobe tracing:
    [ 59.804258] ------------[ cut here ]------------
    [ 59.805682] WARNING: CPU: 3 PID: 1 at kernel/trace/trace_kprobe.c:1987 kprobe_trace_self_tests_ib
    [ 59.806944] Modules linked in:
    [ 59.807335] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc7+ #172
    [ 59.808029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/204
    [ 59.808999] RIP: 0010:kprobe_trace_self_tests_init+0x5f/0x42b
    [ 59.809696] Code: e8 03 00 00 48 c7 c7 30 8e 07 82 e8 6d 3c 46 ff 48 c7 c6 00 b2 1a 81 48 c7 c7 7
    [ 59.812439] RSP: 0018:ffffc90000013e78 EFLAGS: 00010282
    [ 59.813038] RAX: 00000000ffffffef RBX: 0000000000000000 RCX: 0000000000049443
    [ 59.813780] RDX: 0000000000049403 RSI: 0000000000049403 RDI: 000000000002deb0
    [ 59.814589] RBP: ffffc90000013e90 R08: 0000000000000001 R09: 0000000000000001
    [ 59.815349] R10: 0000000000000001 R11: 0000000000000000 R12: 00000000ffffffef
    [ 59.816138] R13: ffff888004613d80 R14: ffffffff82696940 R15: ffff888004429138
    [ 59.816877] FS: 0000000000000000(0000) GS:ffff88807dcc0000(0000) knlGS:0000000000000000
    [ 59.817772] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 59.818395] CR2: 0000000001a8dd38 CR3: 0000000002222000 CR4: 00000000000006a0
    [ 59.819144] Call Trace:
    [ 59.819469] ? init_kprobe_trace+0x6b/0x6b
    [ 59.819948] do_one_initcall+0x5f/0x300
    [ 59.820392] ? rcu_read_lock_sched_held+0x4f/0x80
    [ 59.820916] kernel_init_freeable+0x22a/0x271
    [ 59.821416] ? rest_init+0x241/0x241
    [ 59.821841] kernel_init+0xe/0x10f
    [ 59.822251] ret_from_fork+0x22/0x30
    [ 59.822683] irq event stamp: 16403349
    [ 59.823121] hardirqs last enabled at (16403359): [] console_unlock+0x48e/0x580
    [ 59.824074] hardirqs last disabled at (16403368): [] console_unlock+0x3f6/0x580
    [ 59.825036] softirqs last enabled at (16403200): [] __do_softirq+0x33a/0x484
    [ 59.825982] softirqs last disabled at (16403087): [] asm_call_irq_on_stack+0x10
    [ 59.827034] ---[ end trace 200c544775cdfeb3 ]---
    [ 59.827635] trace_kprobe: error on probing function entry.

    Link: https://lkml.kernel.org/r/160741764955.3448999.3347769358299456915.stgit@devnote2

    Fixes: 4d655281eb1b ("tracing/boot Add kprobe event support")
    Cc: Ingo Molnar
    Cc: stable@vger.kernel.org
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Masami Hiramatsu
     
  • [ Upstream commit 12cc126df82c96c89706aa207ad27c56f219047c ]

    __module_address() needs to be called with preemption disabled or with
    module_mutex taken. preempt_disable() is enough for read-only uses, which is
    what this fix does. Also, module_put() does internal check for NULL, so drop
    it as well.

    Fixes: a38d1107f937 ("bpf: support raw tracepoints in modules")
    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Acked-by: Martin KaFai Lau
    Link: https://lore.kernel.org/bpf/20201203204634.1325171-2-andrii@kernel.org
    Signed-off-by: Sasha Levin

    Andrii Nakryiko
     

12 Dec, 2020

1 commit

  • Remove bpf_ prefix, which causes these helpers to be reported in verifier
    dump as bpf_bpf_this_cpu_ptr() and bpf_bpf_per_cpu_ptr(), respectively. Lets
    fix it as long as it is still possible before UAPI freezes on these helpers.

    Fixes: eaa6bcb71ef6 ("bpf: Introduce bpf_per_cpu_ptr()")
    Signed-off-by: Andrii Nakryiko
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Signed-off-by: Linus Torvalds

    Andrii Nakryiko
     

08 Dec, 2020

1 commit

  • Pull tracing fix from Steven Rostedt:
    "Fix userstacktrace option for instances

    While writing an application that requires user stack trace option to
    work in instances, I found that the instance option has a bug that
    makes it a nop. The check for performing the user stack trace in an
    instance, checks the top level options (not the instance options) to
    determine if a user stack trace should be performed or not.

    This is not only incorrect, but also confusing for users. It confused
    me for a bit!"

    * tag 'trace-v5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Fix userstacktrace option for instances

    Linus Torvalds
     

05 Dec, 2020

1 commit

  • When the instances were able to use their own options, the userstacktrace
    option was left hardcoded for the top level. This made the instance
    userstacktrace option bascially into a nop, and will confuse users that set
    it, but nothing happens (I was confused when it happened to me!)

    Cc: stable@vger.kernel.org
    Fixes: 16270145ce6b ("tracing: Add trace options for core options to instances")
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

02 Dec, 2020

1 commit

  • Pull tracing fixes from Steven Rostedt:

    - Use correct timestamp variable for ring buffer write stamp update

    - Fix up before stamp and write stamp when crossing ring buffer sub
    buffers

    - Keep a zero delta in ring buffer in slow path if cmpxchg fails

    - Fix trace_printk static buffer for archs that care

    - Fix ftrace record accounting for ftrace ops with trampolines

    - Fix DYNAMIC_FTRACE_WITH_DIRECT_CALLS dependency

    - Remove WARN_ON in hwlat tracer that triggers on something that is OK

    - Make "my_tramp" trampoline in ftrace direct sample code global

    - Fixes in the bootconfig tool for better alignment management

    * tag 'trace-v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    ring-buffer: Always check to put back before stamp when crossing pages
    ftrace: Fix DYNAMIC_FTRACE_WITH_DIRECT_CALLS dependency
    ftrace: Fix updating FTRACE_FL_TRAMP
    tracing: Fix alignment of static buffer
    tracing: Remove WARN_ON in start_thread()
    samples/ftrace: Mark my_tramp[12]? global
    ring-buffer: Set the right timestamp in the slow path of __rb_reserve_next()
    ring-buffer: Update write stamp with the correct ts
    docs: bootconfig: Update file format on initrd image
    tools/bootconfig: Align the bootconfig applied initrd image size to 4
    tools/bootconfig: Fix to check the write failure correctly
    tools/bootconfig: Fix errno reference after printf()

    Linus Torvalds
     

01 Dec, 2020

7 commits

  • The current ring buffer logic checks to see if the updating of the event
    buffer was interrupted, and if it is, it will try to fix up the before stamp
    with the write stamp to make them equal again. This logic is flawed, because
    if it is not interrupted, the two are guaranteed to be different, as the
    current event just updated the before stamp before allocation. This
    guarantees that the next event (this one or another interrupting one) will
    think it interrupted the time updates of a previous event and inject an
    absolute time stamp to compensate.

    The correct logic is to always update the timestamps when traversing to a
    new sub buffer.

    Cc: stable@vger.kernel.org
    Fixes: a389d86f7fd09 ("ring-buffer: Have nested events still record running time stamp")
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • DYNAMIC_FTRACE_WITH_DIRECT_CALLS should depend on
    DYNAMIC_FTRACE_WITH_REGS since we need ftrace_regs_caller().

    Link: https://lkml.kernel.org/r/fc4b257ea8689a36f086d2389a9ed989496ca63a.1606412433.git.naveen.n.rao@linux.vnet.ibm.com

    Cc: stable@vger.kernel.org
    Fixes: 763e34e74bb7d5c ("ftrace: Add register_ftrace_direct()")
    Signed-off-by: Naveen N. Rao
    Signed-off-by: Steven Rostedt (VMware)

    Naveen N. Rao
     
  • On powerpc, kprobe-direct.tc triggered FTRACE_WARN_ON() in
    ftrace_get_addr_new() followed by the below message:
    Bad trampoline accounting at: 000000004222522f (wake_up_process+0xc/0x20) (f0000001)

    The set of steps leading to this involved:
    - modprobe ftrace-direct-too
    - enable_probe
    - modprobe ftrace-direct
    - rmmod ftrace-direct
    Signed-off-by: Steven Rostedt (VMware)

    Naveen N. Rao
     
  • With 5.9 kernel on ARM64, I found ftrace_dump output was broken but
    it had no problem with normal output "cat /sys/kernel/debug/tracing/trace".

    With investigation, it seems coping the data into temporal buffer seems to
    break the align binary printf expects if the static buffer is not aligned
    with 4-byte. IIUC, get_arg in bstr_printf expects that args has already
    right align to be decoded and seq_buf_bprintf says ``the arguments are saved
    in a 32bit word array that is defined by the format string constraints``.
    So if we don't keep the align under copy to temporal buffer, the output
    will be broken by shifting some bytes.

    This patch fixes it.

    Link: https://lkml.kernel.org/r/20201125225654.1618966-1-minchan@kernel.org

    Cc:
    Fixes: 8e99cf91b99bb ("tracing: Do not allocate buffer in trace_find_next_entry() in atomic")
    Signed-off-by: Namhyung Kim
    Signed-off-by: Minchan Kim
    Signed-off-by: Steven Rostedt (VMware)

    Minchan Kim
     
  • This patch reverts commit 978defee11a5 ("tracing: Do a WARN_ON()
    if start_thread() in hwlat is called when thread exists")

    .start hook can be legally called several times if according
    tracer is stopped

    screen window 1
    [root@localhost ~]# echo 1 > /sys/kernel/tracing/events/kmem/kfree/enable
    [root@localhost ~]# echo 1 > /sys/kernel/tracing/options/pause-on-trace
    [root@localhost ~]# less -F /sys/kernel/tracing/trace

    screen window 2
    [root@localhost ~]# cat /sys/kernel/debug/tracing/tracing_on
    0
    [root@localhost ~]# echo hwlat > /sys/kernel/debug/tracing/current_tracer
    [root@localhost ~]# echo 1 > /sys/kernel/debug/tracing/tracing_on
    [root@localhost ~]# cat /sys/kernel/debug/tracing/tracing_on
    0
    [root@localhost ~]# echo 2 > /sys/kernel/debug/tracing/tracing_on

    triggers warning in dmesg:
    WARNING: CPU: 3 PID: 1403 at kernel/trace/trace_hwlat.c:371 hwlat_tracer_start+0xc9/0xd0

    Link: https://lkml.kernel.org/r/bd4d3e70-400d-9c82-7b73-a2d695e86b58@virtuozzo.com

    Cc: Ingo Molnar
    Cc: stable@vger.kernel.org
    Fixes: 978defee11a5 ("tracing: Do a WARN_ON() if start_thread() in hwlat is called when thread exists")
    Signed-off-by: Vasily Averin
    Signed-off-by: Steven Rostedt (VMware)

    Vasily Averin
     
  • In the slow path of __rb_reserve_next() a nested event(s) can happen
    between evaluating the timestamp delta of the current event and updating
    write_stamp via local_cmpxchg(); in this case the delta is not valid
    anymore and it should be set to 0 (same timestamp as the interrupting
    event), since the event that we are currently processing is not the last
    event in the buffer.

    Link: https://lkml.kernel.org/r/X8IVJcp1gRE+FJCJ@xps-13-7390

    Cc: Ingo Molnar
    Cc: Masami Hiramatsu
    Cc: stable@vger.kernel.org
    Link: https://lwn.net/Articles/831207
    Fixes: a389d86f7fd0 ("ring-buffer: Have nested events still record running time stamp")
    Signed-off-by: Andrea Righi
    Signed-off-by: Steven Rostedt (VMware)

    Andrea Righi
     
  • The write stamp, used to calculate deltas between events, was updated with
    the stale "ts" value in the "info" structure, and not with the updated "ts"
    variable. This caused the deltas between events to be inaccurate, and when
    crossing into a new sub buffer, had time go backwards.

    Link: https://lkml.kernel.org/r/20201124223917.795844-1-elavila@google.com

    Cc: stable@vger.kernel.org
    Fixes: a389d86f7fd09 ("ring-buffer: Have nested events still record running time stamp")
    Reported-by: "J. Avila"
    Tested-by: Daniel Mentz
    Tested-by: Will McVicker
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

20 Nov, 2020

1 commit

  • do_strncpy_from_user() may copy some extra bytes after the NUL
    terminator into the destination buffer. This usually does not matter for
    normal string operations. However, when BPF programs key BPF maps with
    strings, this matters a lot.

    A BPF program may read strings from user memory by calling the
    bpf_probe_read_user_str() helper which eventually calls
    do_strncpy_from_user(). The program can then key a map with the
    destination buffer. BPF map keys are fixed-width and string-agnostic,
    meaning that map keys are treated as a set of bytes.

    The issue is when do_strncpy_from_user() overcopies bytes after the NUL
    terminator, it can result in seemingly identical strings occupying
    multiple slots in a BPF map. This behavior is subtle and totally
    unexpected by the user.

    This commit masks out the bytes following the NUL while preserving
    long-sized stride in the fast path.

    Fixes: 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers")
    Signed-off-by: Daniel Xu
    Signed-off-by: Alexei Starovoitov
    Link: https://lore.kernel.org/bpf/21efc982b3e9f2f7b0379eed642294caaa0c27a7.1605642949.git.dxu@dxuuu.xyz

    Daniel Xu
     

10 Nov, 2020

1 commit

  • There is a bug when passing zero to PTR_ERR() and return.
    Fix the smatch error.

    Fixes: c4d0bfb45068 ("bpf: Add bpf_snprintf_btf helper")
    Signed-off-by: Wang Qing
    Signed-off-by: Daniel Borkmann
    Acked-by: Yonghong Song
    Acked-by: John Fastabend
    Link: https://lore.kernel.org/bpf/1604735144-686-1-git-send-email-wangqing@vivo.com

    Wang Qing
     

03 Nov, 2020

3 commits

  • parse_synth_field() returns a pointer and requires that errors get
    surrounded by ERR_PTR(). The ret variable is initialized to zero, but should
    never be used as zero, and if it is, it could cause a false return code and
    produce a NULL pointer dereference. It makes no sense to set ret to zero.

    Set ret to -ENOMEM (the most common error case), and have any other errors
    set it to something else. This removes the need to initialize ret on *every*
    error branch.

    Fixes: 761a8c58db6b ("tracing, synthetic events: Replace buggy strcat() with seq_buf operations")
    Reported-by: Dan Carpenter
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • The recursion protection of the ring buffer depends on preempt_count() to be
    correct. But it is possible that the ring buffer gets called after an
    interrupt comes in but before it updates the preempt_count(). This will
    trigger a false positive in the recursion code.

    Use the same trick from the ftrace function callback recursion code which
    uses a "transition" bit that gets set, to allow for a single recursion for
    to handle transitions between contexts.

    Cc: stable@vger.kernel.org
    Fixes: 567cd4da54ff4 ("ring-buffer: User context bit recursion checking")
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • The array size is FTRACE_KSTACK_NESTING, so the index FTRACE_KSTACK_NESTING
    is illegal too. And fix two typos by the way.

    Link: https://lkml.kernel.org/r/20201031085714.2147-1-hqjagain@gmail.com

    Signed-off-by: Qiujun Huang
    Signed-off-by: Steven Rostedt (VMware)

    Qiujun Huang
     

02 Nov, 2020

3 commits

  • When an interrupt or NMI comes in and switches the context, there's a delay
    from when the preempt_count() shows the update. As the preempt_count() is
    used to detect recursion having each context have its own bit get set when
    tracing starts, and if that bit is already set, it is considered a recursion
    and the function exits. But if this happens in that section where context
    has changed but preempt_count() has not been updated, this will be
    incorrectly flagged as a recursion.

    To handle this case, create another bit call TRANSITION and test it if the
    current context bit is already set. Flag the call as a recursion if the
    TRANSITION bit is already set, and if not, set it and continue. The
    TRANSITION bit will be cleared normally on the return of the function that
    set it, or if the current context bit is clear, set it and clear the
    TRANSITION bit to allow for another transition between the current context
    and an even higher one.

    Cc: stable@vger.kernel.org
    Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks")
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • The code that checks recursion will work to only do the recursion check once
    if there's nested checks. The top one will do the check, the other nested
    checks will see recursion was already checked and return zero for its "bit".
    On the return side, nothing will be done if the "bit" is zero.

    The problem is that zero is returned for the "good" bit when in NMI context.
    This will set the bit for NMIs making it look like *all* NMI tracing is
    recursing, and prevent tracing of anything in NMI context!

    The simple fix is to return "bit + 1" and subtract that bit on the end to
    get the real bit.

    Cc: stable@vger.kernel.org
    Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks")
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     
  • The nesting count of trace_printk allows for 4 levels of nesting. The
    nesting counter starts at zero and is incremented before being used to
    retrieve the current context's buffer. But the index to the buffer uses the
    nesting counter after it was incremented, and not its original number,
    which in needs to do.

    Link: https://lkml.kernel.org/r/20201029161905.4269-1-hqjagain@gmail.com

    Cc: stable@vger.kernel.org
    Fixes: 3d9622c12c887 ("tracing: Add barrier to trace_printk() buffer nesting modification")
    Signed-off-by: Qiujun Huang
    Signed-off-by: Steven Rostedt (VMware)

    Qiujun Huang
     

29 Oct, 2020

1 commit

  • Pull tracing fix from Steven Rostedt:
    "Fix synthetic event "strcat" overrun

    New synthetic event code used strcat() and miscalculated the ending,
    causing the concatenation to write beyond the allocated memory.

    Instead of using strncat(), the code is switched over to seq_buf which
    has all the mechanisms in place to protect against writing more than
    what is allocated, and cleans up the code a bit"

    * tag 'trace-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing, synthetic events: Replace buggy strcat() with seq_buf operations

    Linus Torvalds
     

27 Oct, 2020

1 commit

  • There was a memory corruption bug happening while running the synthetic
    event selftests:

    kmemleak: Cannot insert 0xffff8c196fa2afe5 into the object search tree (overlaps existing)
    CPU: 5 PID: 6866 Comm: ftracetest Tainted: G W 5.9.0-rc5-test+ #577
    Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
    Call Trace:
    dump_stack+0x8d/0xc0
    create_object.cold+0x3b/0x60
    slab_post_alloc_hook+0x57/0x510
    ? tracing_map_init+0x178/0x340
    __kmalloc+0x1b1/0x390
    tracing_map_init+0x178/0x340
    event_hist_trigger_func+0x523/0xa40
    trigger_process_regex+0xc5/0x110
    event_trigger_write+0x71/0xd0
    vfs_write+0xca/0x210
    ksys_write+0x70/0xf0
    do_syscall_64+0x33/0x40
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7fef0a63a487
    Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
    RSP: 002b:00007fff76f18398 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 0000000000000039 RCX: 00007fef0a63a487
    RDX: 0000000000000039 RSI: 000055eb3b26d690 RDI: 0000000000000001
    RBP: 000055eb3b26d690 R08: 000000000000000a R09: 0000000000000038
    R10: 000055eb3b2cdb80 R11: 0000000000000246 R12: 0000000000000039
    R13: 00007fef0a70b500 R14: 0000000000000039 R15: 00007fef0a70b700
    kmemleak: Kernel memory leak detector disabled
    kmemleak: Object 0xffff8c196fa2afe0 (size 8):
    kmemleak: comm "ftracetest", pid 6866, jiffies 4295082531
    kmemleak: min_count = 1
    kmemleak: count = 0
    kmemleak: flags = 0x1
    kmemleak: checksum = 0
    kmemleak: backtrace:
    __kmalloc+0x1b1/0x390
    tracing_map_init+0x1be/0x340
    event_hist_trigger_func+0x523/0xa40
    trigger_process_regex+0xc5/0x110
    event_trigger_write+0x71/0xd0
    vfs_write+0xca/0x210
    ksys_write+0x70/0xf0
    do_syscall_64+0x33/0x40
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

    The cause came down to a use of strcat() that was adding an string that was
    shorten, but the strcat() did not take that into account.

    strcat() is extremely dangerous as it does not care how big the buffer is.
    Replace it with seq_buf operations that prevent the buffer from being
    overwritten if what is being written is bigger than the buffer.

    Fixes: 10819e25799a ("tracing: Handle synthetic event array field type checking correctly")
    Reviewed-by: Tom Zanussi
    Tested-by: Tom Zanussi
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

26 Oct, 2020

1 commit

  • Use a more generic form for __section that requires quotes to avoid
    complications with clang and gcc differences.

    Remove the quote operator # from compiler_attributes.h __section macro.

    Convert all unquoted __section(foo) uses to quoted __section("foo").
    Also convert __attribute__((section("foo"))) uses to __section("foo")
    even if the __attribute__ has multiple list entry forms.

    Conversion done using the script at:

    https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl

    Signed-off-by: Joe Perches
    Reviewed-by: Nick Desaulniers
    Reviewed-by: Miguel Ojeda
    Signed-off-by: Linus Torvalds

    Joe Perches
     

24 Oct, 2020

1 commit


22 Oct, 2020

2 commits

  • The function changed at some point, but the description was not
    updated.

    Link: https://lkml.kernel.org/r/20201017095246.5170-1-hqjagain@gmail.com

    Signed-off-by: Qiujun Huang
    Signed-off-by: Steven Rostedt (VMware)

    Qiujun Huang
     
  • We don't need to check the new buffer size, and the return value
    had confused resize_buffer_duplicate_size().
    ...
    ret = ring_buffer_resize(trace_buf->buffer,
    per_cpu_ptr(size_buf->data,cpu_id)->entries, cpu_id);
    if (ret == 0)
    per_cpu_ptr(trace_buf->data, cpu_id)->entries =
    per_cpu_ptr(size_buf->data, cpu_id)->entries;
    ...

    Link: https://lkml.kernel.org/r/20201019142242.11560-1-hqjagain@gmail.com

    Cc: stable@vger.kernel.org
    Fixes: d60da506cbeb3 ("tracing: Add a resize function to make one buffer equivalent to another buffer")
    Signed-off-by: Qiujun Huang
    Signed-off-by: Steven Rostedt (VMware)

    Qiujun Huang
     

17 Oct, 2020

1 commit


16 Oct, 2020

10 commits

  • The commit 720dee53ad8d ("tracing/boot: Initialize per-instance event
    list in early boot") removes __init from __trace_early_add_events()
    but __trace_early_add_new_event() still has __init and will cause a
    section mismatch.

    Remove __init from __trace_early_add_new_event() as same as
    __trace_early_add_events().

    Link: https://lore.kernel.org/lkml/CAHk-=wjU86UhovK4XuwvCqTOfc+nvtpAuaN2PJBz15z=w=u0Xg@mail.gmail.com/

    Reported-by: Linus Torvalds
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Pull networking updates from Jakub Kicinski:

    - Add redirect_neigh() BPF packet redirect helper, allowing to limit
    stack traversal in common container configs and improving TCP
    back-pressure.

    Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

    - Expand netlink policy support and improve policy export to user
    space. (Ge)netlink core performs request validation according to
    declared policies. Expand the expressiveness of those policies
    (min/max length and bitmasks). Allow dumping policies for particular
    commands. This is used for feature discovery by user space (instead
    of kernel version parsing or trial and error).

    - Support IGMPv3/MLDv2 multicast listener discovery protocols in
    bridge.

    - Allow more than 255 IPv4 multicast interfaces.

    - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
    packets of TCPv6.

    - In Multi-patch TCP (MPTCP) support concurrent transmission of data on
    multiple subflows in a load balancing scenario. Enhance advertising
    addresses via the RM_ADDR/ADD_ADDR options.

    - Support SMC-Dv2 version of SMC, which enables multi-subnet
    deployments.

    - Allow more calls to same peer in RxRPC.

    - Support two new Controller Area Network (CAN) protocols - CAN-FD and
    ISO 15765-2:2016.

    - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
    kernel problem.

    - Add TC actions for implementing MPLS L2 VPNs.

    - Improve nexthop code - e.g. handle various corner cases when nexthop
    objects are removed from groups better, skip unnecessary
    notifications and make it easier to offload nexthops into HW by
    converting to a blocking notifier.

    - Support adding and consuming TCP header options by BPF programs,
    opening the doors for easy experimental and deployment-specific TCP
    option use.

    - Reorganize TCP congestion control (CC) initialization to simplify
    life of TCP CC implemented in BPF.

    - Add support for shipping BPF programs with the kernel and loading
    them early on boot via the User Mode Driver mechanism, hence reusing
    all the user space infra we have.

    - Support sleepable BPF programs, initially targeting LSM and tracing.

    - Add bpf_d_path() helper for returning full path for given 'struct
    path'.

    - Make bpf_tail_call compatible with bpf-to-bpf calls.

    - Allow BPF programs to call map_update_elem on sockmaps.

    - Add BPF Type Format (BTF) support for type and enum discovery, as
    well as support for using BTF within the kernel itself (current use
    is for pretty printing structures).

    - Support listing and getting information about bpf_links via the bpf
    syscall.

    - Enhance kernel interfaces around NIC firmware update. Allow
    specifying overwrite mask to control if settings etc. are reset
    during update; report expected max time operation may take to users;
    support firmware activation without machine reboot incl. limits of
    how much impact reset may have (e.g. dropping link or not).

    - Extend ethtool configuration interface to report IEEE-standard
    counters, to limit the need for per-vendor logic in user space.

    - Adopt or extend devlink use for debug, monitoring, fw update in many
    drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
    dpaa2-eth).

    - In mlxsw expose critical and emergency SFP module temperature alarms.
    Refactor port buffer handling to make the defaults more suitable and
    support setting these values explicitly via the DCBNL interface.

    - Add XDP support for Intel's igb driver.

    - Support offloading TC flower classification and filtering rules to
    mscc_ocelot switches.

    - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
    fixed interval period pulse generator and one-step timestamping in
    dpaa-eth.

    - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
    offload.

    - Add Lynx PHY/PCS MDIO module, and convert various drivers which have
    this HW to use it. Convert mvpp2 to split PCS.

    - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
    7-port Mediatek MT7531 IP.

    - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
    and wcn3680 support in wcn36xx.

    - Improve performance for packets which don't require much offloads on
    recent Mellanox NICs by 20% by making multiple packets share a
    descriptor entry.

    - Move chelsio inline crypto drivers (for TLS and IPsec) from the
    crypto subtree to drivers/net. Move MDIO drivers out of the phy
    directory.

    - Clean up a lot of W=1 warnings, reportedly the actively developed
    subsections of networking drivers should now build W=1 warning free.

    - Make sure drivers don't use in_interrupt() to dynamically adapt their
    code. Convert tasklets to use new tasklet_setup API (sadly this
    conversion is not yet complete).

    * tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
    Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
    net, sockmap: Don't call bpf_prog_put() on NULL pointer
    bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
    bpf, sockmap: Add locking annotations to iterator
    netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
    net: fix pos incrementment in ipv6_route_seq_next
    net/smc: fix invalid return code in smcd_new_buf_create()
    net/smc: fix valid DMBE buffer sizes
    net/smc: fix use-after-free of delayed events
    bpfilter: Fix build error with CONFIG_BPFILTER_UMH
    cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
    net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
    bpf: Fix register equivalence tracking.
    rxrpc: Fix loss of final ack on shutdown
    rxrpc: Fix bundle counting for exclusive connections
    netfilter: restore NF_INET_NUMHOOKS
    ibmveth: Identify ingress large send packets.
    ibmveth: Switch order of ibmveth_helper calls.
    cxgb4: handle 4-tuple PEDIT to NAT mode translation
    selftests: Add VRF route leaking tests
    ...

    Linus Torvalds
     
  • Pull tracing updates from Steven Rostedt:
    "Updates for tracing and bootconfig:

    - Add support for "bool" type in synthetic events

    - Add per instance tracing for bootconfig

    - Support perf-style return probe ("SYMBOL%return") in kprobes and
    uprobes

    - Allow for kprobes to be enabled earlier in boot up

    - Added tracepoint helper function to allow testing if tracepoints
    are enabled in headers

    - Synthetic events can now have dynamic strings (variable length)

    - Various fixes and cleanups"

    * tag 'trace-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (58 commits)
    tracing: support "bool" type in synthetic trace events
    selftests/ftrace: Add test case for synthetic event syntax errors
    tracing: Handle synthetic event array field type checking correctly
    selftests/ftrace: Change synthetic event name for inter-event-combined test
    tracing: Add synthetic event error logging
    tracing: Check that the synthetic event and field names are legal
    tracing: Move is_good_name() from trace_probe.h to trace.h
    tracing: Don't show dynamic string internals in synthetic event description
    tracing: Fix some typos in comments
    tracing/boot: Add ftrace.instance.*.alloc_snapshot option
    tracing: Fix race in trace_open and buffer resize call
    tracing: Check return value of __create_val_fields() before using its result
    tracing: Fix synthetic print fmt check for use of __get_str()
    tracing: Remove a pointless assignment
    ftrace: ftrace_global_list is renamed to ftrace_ops_list
    ftrace: Format variable declarations of ftrace_allocate_records
    ftrace: Simplify the calculation of page number for ftrace_page->records
    ftrace: Simplify the dyn_ftrace->flags macro
    ftrace: Simplify the hash calculation
    ftrace: Use fls() to get the bits for dup_hash()
    ...

    Linus Torvalds
     
  • Pull char/misc driver updates from Greg KH:
    "Here is the big set of char, misc, and other assorted driver subsystem
    patches for 5.10-rc1.

    There's a lot of different things in here, all over the drivers/
    directory. Some summaries:

    - soundwire driver updates

    - habanalabs driver updates

    - extcon driver updates

    - nitro_enclaves new driver

    - fsl-mc driver and core updates

    - mhi core and bus updates

    - nvmem driver updates

    - eeprom driver updates

    - binder driver updates and fixes

    - vbox minor bugfixes

    - fsi driver updates

    - w1 driver updates

    - coresight driver updates

    - interconnect driver updates

    - misc driver updates

    - other minor driver updates

    All of these have been in linux-next for a while with no reported
    issues"

    * tag 'char-misc-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (396 commits)
    binder: fix UAF when releasing todo list
    docs: w1: w1_therm: Fix broken xref, mistakes, clarify text
    misc: Kconfig: fix a HISI_HIKEY_USB dependency
    LSM: Fix type of id parameter in kernel_post_load_data prototype
    misc: Kconfig: add a new dependency for HISI_HIKEY_USB
    firmware_loader: fix a kernel-doc markup
    w1: w1_therm: make w1_poll_completion static
    binder: simplify the return expression of binder_mmap
    test_firmware: Test partial read support
    firmware: Add request_partial_firmware_into_buf()
    firmware: Store opt_flags in fw_priv
    fs/kernel_file_read: Add "offset" arg for partial reads
    IMA: Add support for file reads without contents
    LSM: Add "contents" flag to kernel_read_file hook
    module: Call security_kernel_post_load_data()
    firmware_loader: Use security_post_load_data()
    LSM: Introduce kernel_post_load_data() hook
    fs/kernel_read_file: Add file_size output argument
    fs/kernel_read_file: Switch buffer size arg to size_t
    fs/kernel_read_file: Remove redundant size argument
    ...

    Linus Torvalds
     
  • It's common [1] to define tracepoint fields as "bool" when they contain
    a true / false value. Currently, defining a synthetic event with a
    "bool" field yields EINVAL. It's possible to work around this by using
    e.g. u8 (assuming sizeof(bool) is 1, and bool is unsigned; if either of
    these properties don't match, you get EINVAL [2]).

    Supporting "bool" explicitly makes hooking this up easier and more
    portable for userspace.

    [1]: grep -r "bool" include/trace/events/
    [2]: check_synth_field() in kernel/trace/trace_events_hist.c

    Link: https://lkml.kernel.org/r/20201009220524.485102-2-axelrasmussen@google.com

    Acked-by: Michel Lespinasse
    Acked-by: David Rientjes
    Signed-off-by: Axel Rasmussen
    Signed-off-by: Steven Rostedt (VMware)

    Axel Rasmussen
     
  • Since synthetic event array types are derived from the field name,
    there may be a semicolon at the end of the type which should be
    stripped off.

    If there are more characters following that, normal type string
    checking will result in an invalid type.

    Without this patch, you can end up with an invalid field type string
    that gets displayed in both the synthetic event description and the
    event format:

    Before:

    # echo 'myevent char str[16]; int v' >> synthetic_events
    # cat synthetic_events
    myevent char[16]; str; int v

    name: myevent
    ID: 1936
    format:
    field:unsigned short common_type; offset:0; size:2; signed:0;
    field:unsigned char common_flags; offset:2; size:1; signed:0;
    field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
    field:int common_pid; offset:4; size:4; signed:1;

    field:char str[16];; offset:8; size:16; signed:1;
    field:int v; offset:40; size:4; signed:1;

    print fmt: "str=%s, v=%d", REC->str, REC->v

    After:

    # echo 'myevent char str[16]; int v' >> synthetic_events
    # cat synthetic_events
    myevent char[16] str; int v

    # cat events/synthetic/myevent/format
    name: myevent
    ID: 1936
    format:
    field:unsigned short common_type; offset:0; size:2; signed:0;
    field:unsigned char common_flags; offset:2; size:1; signed:0;
    field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
    field:int common_pid; offset:4; size:4; signed:1;

    field:char str[16]; offset:8; size:16; signed:1;
    field:int v; offset:40; size:4; signed:1;

    print fmt: "str=%s, v=%d", REC->str, REC->v

    Link: https://lkml.kernel.org/r/6587663b56c2d45ab9d8c8472a2110713cdec97d.1602598160.git.zanussi@kernel.org

    [ : wrote parse_synth_field() snippet. ]
    Fixes: 4b147936fa50 (tracing: Add support for 'synthetic' events)
    Reported-by: Masami Hiramatsu
    Tested-by: Masami Hiramatsu
    Signed-off-by: Tom Zanussi
    Signed-off-by: Steven Rostedt (VMware)

    Tom Zanussi
     
  • Add support for synthetic event error logging, which entails adding a
    logging function for it, a way to save the synthetic event command,
    and a set of specific synthetic event parse error strings and
    handling.

    Link: https://lkml.kernel.org/r/ed099c66df13b40cfc633aaeb17f66c37a923066.1602598160.git.zanussi@kernel.org

    [ : wrote save_cmdstr() seq_buf implementation. ]
    Tested-by: Masami Hiramatsu
    Signed-off-by: Tom Zanussi
    Signed-off-by: Steven Rostedt (VMware)

    Tom Zanussi
     
  • Call the is_good_name() function used by probe events to make sure
    synthetic event and field names don't contain illegal characters and
    cause unexpected parsing of synthetic event commands.

    Link: https://lkml.kernel.org/r/c4d4bb59d3ac39bcbd70fba0cf837d6b1cedb015.1602598160.git.zanussi@kernel.org

    Fixes: 4b147936fa50 (tracing: Add support for 'synthetic' events)
    Reported-by: Masami Hiramatsu
    Reviewed-by: Masami Hiramatsu
    Tested-by: Masami Hiramatsu
    Signed-off-by: Tom Zanussi
    Signed-off-by: Steven Rostedt (VMware)

    Tom Zanussi
     
  • is_good_name() is useful for other trace infrastructure, such as
    synthetic events, so make it available via trace.h.

    Link: https://lkml.kernel.org/r/cc6d6a2d7da6957fcbe1e2922e76d18d2bb459b4.1602598160.git.zanussi@kernel.org

    Acked-by: Masami Hiramatsu
    Tested-by: Masami Hiramatsu
    Signed-off-by: Tom Zanussi
    Signed-off-by: Steven Rostedt (VMware)

    Tom Zanussi
     
  • For synthetic event dynamic fields, the type contains "__data_loc",
    which is basically an internal part of the type which is only meant to
    be displayed in the format, not in the event description itself, which
    is confusing to users since they can't use __data_loc on the
    command-line to define an event field, which printing it would lead
    them to believe.

    So filter it out from the description, while leaving it in the type.

    Link: https://lkml.kernel.org/r/b3b7baf7813298a5ede4ff02e2e837b91c05a724.1602598160.git.zanussi@kernel.org

    Reported-by: Masami Hiramatsu
    Tested-by: Masami Hiramatsu
    Signed-off-by: Tom Zanussi
    Signed-off-by: Steven Rostedt (VMware)

    Tom Zanussi