04 Nov, 2020

1 commit

  • Since the kprobe handlers have protection that prohibits other handlers from
    executing in other contexts (like if an NMI comes in while processing a
    kprobe, and executes the same kprobe, it will get fail with a "busy"
    return). Lockdep is unaware of this protection. Use lockdep's nesting api to
    differentiate between locks taken in INT3 context and other context to
    suppress the false warnings.

    Link: https://lore.kernel.org/r/20201102160234.fa0ae70915ad9e2b21c08b85@kernel.org

    Cc: Peter Zijlstra
    Acked-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

16 Oct, 2020

1 commit

  • Pull tracing updates from Steven Rostedt:
    "Updates for tracing and bootconfig:

    - Add support for "bool" type in synthetic events

    - Add per instance tracing for bootconfig

    - Support perf-style return probe ("SYMBOL%return") in kprobes and
    uprobes

    - Allow for kprobes to be enabled earlier in boot up

    - Added tracepoint helper function to allow testing if tracepoints
    are enabled in headers

    - Synthetic events can now have dynamic strings (variable length)

    - Various fixes and cleanups"

    * tag 'trace-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (58 commits)
    tracing: support "bool" type in synthetic trace events
    selftests/ftrace: Add test case for synthetic event syntax errors
    tracing: Handle synthetic event array field type checking correctly
    selftests/ftrace: Change synthetic event name for inter-event-combined test
    tracing: Add synthetic event error logging
    tracing: Check that the synthetic event and field names are legal
    tracing: Move is_good_name() from trace_probe.h to trace.h
    tracing: Don't show dynamic string internals in synthetic event description
    tracing: Fix some typos in comments
    tracing/boot: Add ftrace.instance.*.alloc_snapshot option
    tracing: Fix race in trace_open and buffer resize call
    tracing: Check return value of __create_val_fields() before using its result
    tracing: Fix synthetic print fmt check for use of __get_str()
    tracing: Remove a pointless assignment
    ftrace: ftrace_global_list is renamed to ftrace_ops_list
    ftrace: Format variable declarations of ftrace_allocate_records
    ftrace: Simplify the calculation of page number for ftrace_page->records
    ftrace: Simplify the dyn_ftrace->flags macro
    ftrace: Simplify the hash calculation
    ftrace: Use fls() to get the bits for dup_hash()
    ...

    Linus Torvalds
     

13 Oct, 2020

2 commits

  • Pull perf/kprobes updates from Ingo Molnar:
    "This prepares to unify the kretprobe trampoline handler and make
    kretprobe lockless (those patches are still work in progress)"

    * tag 'perf-kprobes-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()
    kprobes: Make local functions static
    kprobes: Free kretprobe_instance with RCU callback
    kprobes: Remove NMI context check
    sparc: kprobes: Use generic kretprobe trampoline handler
    sh: kprobes: Use generic kretprobe trampoline handler
    s390: kprobes: Use generic kretprobe trampoline handler
    powerpc: kprobes: Use generic kretprobe trampoline handler
    parisc: kprobes: Use generic kretprobe trampoline handler
    mips: kprobes: Use generic kretprobe trampoline handler
    ia64: kprobes: Use generic kretprobe trampoline handler
    csky: kprobes: Use generic kretprobe trampoline handler
    arc: kprobes: Use generic kretprobe trampoline handler
    arm64: kprobes: Use generic kretprobe trampoline handler
    arm: kprobes: Use generic kretprobe trampoline handler
    x86/kprobes: Use generic kretprobe trampoline handler
    kprobes: Add generic kretprobe trampoline handler

    Linus Torvalds
     
  • Pull static call support from Ingo Molnar:
    "This introduces static_call(), which is the idea of static_branch()
    applied to indirect function calls. Remove a data load (indirection)
    by modifying the text.

    They give the flexibility of function pointers, but with better
    performance. (This is especially important for cases where retpolines
    would otherwise be used, as retpolines can be pretty slow.)

    API overview:

    DECLARE_STATIC_CALL(name, func);
    DEFINE_STATIC_CALL(name, func);
    DEFINE_STATIC_CALL_NULL(name, typename);

    static_call(name)(args...);
    static_call_cond(name)(args...);
    static_call_update(name, func);

    x86 is supported via text patching, otherwise basic indirect calls are
    used, with function pointers.

    There's a second variant using inline code patching, inspired by
    jump-labels, implemented on x86 as well.

    The new APIs are utilized in the x86 perf code, a heavy user of
    function pointers, where static calls speed up the PMU handler by
    4.2% (!).

    The generic implementation is not really excercised on other
    architectures, outside of the trivial test_static_call_init()
    self-test"

    * tag 'core-static_call-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
    static_call: Fix return type of static_call_init
    tracepoint: Fix out of sync data passing by static caller
    tracepoint: Fix overly long tracepoint names
    x86/perf, static_call: Optimize x86_pmu methods
    tracepoint: Optimize using static_call()
    static_call: Allow early init
    static_call: Add some validation
    static_call: Handle tail-calls
    static_call: Add static_call_cond()
    x86/alternatives: Teach text_poke_bp() to emulate RET
    static_call: Add simple self-test for static calls
    x86/static_call: Add inline static call implementation for x86-64
    x86/static_call: Add out-of-line static call implementation
    static_call: Avoid kprobes on inline static_call()s
    static_call: Add inline static call infrastructure
    static_call: Add basic static call infrastructure
    compiler.h: Make __ADDRESSABLE() symbol truly unique
    jump_label,module: Fix module lifetime for __jump_label_mod_text_reserved()
    module: Properly propagate MODULE_STATE_COMING failure
    module: Fix up module_notifier return values
    ...

    Linus Torvalds
     

23 Sep, 2020

1 commit

  • Pull tracing fixes from Steven Rostedt:

    - Check kprobe is enabled before unregistering from ftrace as it isn't
    registered when disabled.

    - Remove kprobes enabled via command-line that is on init text when
    freed.

    - Add missing RCU synchronization for ftrace trampoline symbols removed
    from kallsyms.

    - Free trampoline on error path if ftrace_startup() fails.

    - Give more space for the longer PID numbers in trace output.

    - Fix a possible double free in the histogram code.

    - A couple of fixes that were discovered by sparse.

    * tag 'trace-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    bootconfig: init: make xbc_namebuf static
    kprobes: tracing/kprobes: Fix to kill kprobes on initmem after boot
    tracing: fix double free
    ftrace: Let ftrace_enable_sysctl take a kernel pointer buffer
    tracing: Make the space reserved for the pid wider
    ftrace: Fix missing synchronize_rcu() removing trampoline from kallsyms
    ftrace: Free the trampoline when ftrace_startup() fails
    kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()

    Linus Torvalds
     

22 Sep, 2020

1 commit

  • Init kprobes feature in early_initcall as same as jump_label and
    dynamic_debug does, so that we can use kprobes events in earlier
    boot stage.

    Link: https://lkml.kernel.org/r/159974151897.478751.8342374158615496628.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

20 Sep, 2020

1 commit

  • If a kprobe is marked as gone, we should not kill it again. Otherwise, we
    can disarm the kprobe more than once. In that case, the statistics of
    kprobe_ftrace_enabled can unbalance which can lead to that kprobe do not
    work.

    Fixes: e8386a0cb22f ("kprobes: support probing module __exit function")
    Co-developed-by: Chengming Zhou
    Signed-off-by: Muchun Song
    Signed-off-by: Chengming Zhou
    Signed-off-by: Andrew Morton
    Acked-by: Masami Hiramatsu
    Cc: "Naveen N . Rao"
    Cc: Anil S Keshavamurthy
    Cc: David S. Miller
    Cc: Song Liu
    Cc: Steven Rostedt
    Cc:
    Link: https://lkml.kernel.org/r/20200822030055.32383-1-songmuchun@bytedance.com
    Signed-off-by: Linus Torvalds

    Muchun Song
     

19 Sep, 2020

1 commit

  • Since kprobe_event= cmdline option allows user to put kprobes on the
    functions in initmem, kprobe has to make such probes gone after boot.
    Currently the probes on the init functions in modules will be handled
    by module callback, but the kernel init text isn't handled.
    Without this, kprobes may access non-exist text area to disable or
    remove it.

    Link: https://lkml.kernel.org/r/159972810544.428528.1839307531600646955.stgit@devnote2

    Fixes: 970988e19eb0 ("tracing/kprobe: Add kprobe_event= boot parameter")
    Cc: Jonathan Corbet
    Cc: Shuah Khan
    Cc: Randy Dunlap
    Cc: Ingo Molnar
    Cc: stable@vger.kernel.org
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

18 Sep, 2020

1 commit

  • Commit 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at
    kprobe_ftrace_handler") fixed one bug but not completely fixed yet.
    If we run a kprobe_module.tc of ftracetest, kernel showed a warning
    as below.

    # ./ftracetest test.d/kprobe/kprobe_module.tc
    === Ftrace unit tests ===
    [1] Kprobe dynamic event - probing module
    ...
    [ 22.400215] ------------[ cut here ]------------
    [ 22.400962] Failed to disarm kprobe-ftrace at trace_printk_irq_work+0x0/0x7e [trace_printk] (-2)
    [ 22.402139] WARNING: CPU: 7 PID: 200 at kernel/kprobes.c:1091 __disarm_kprobe_ftrace.isra.0+0x7e/0xa0
    [ 22.403358] Modules linked in: trace_printk(-)
    [ 22.404028] CPU: 7 PID: 200 Comm: rmmod Not tainted 5.9.0-rc2+ #66
    [ 22.404870] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
    [ 22.406139] RIP: 0010:__disarm_kprobe_ftrace.isra.0+0x7e/0xa0
    [ 22.406947] Code: 30 8b 03 eb c9 80 3d e5 09 1f 01 00 75 dc 49 8b 34 24 89 c2 48 c7 c7 a0 c2 05 82 89 45 e4 c6 05 cc 09 1f 01 01 e8 a9 c7 f0 ff 0b 8b 45 e4 eb b9 89 c6 48 c7 c7 70 c2 05 82 89 45 e4 e8 91 c7
    [ 22.409544] RSP: 0018:ffffc90000237df0 EFLAGS: 00010286
    [ 22.410385] RAX: 0000000000000000 RBX: ffffffff83066024 RCX: 0000000000000000
    [ 22.411434] RDX: 0000000000000001 RSI: ffffffff810de8d3 RDI: ffffffff810de8d3
    [ 22.412687] RBP: ffffc90000237e10 R08: 0000000000000001 R09: 0000000000000001
    [ 22.413762] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88807c478640
    [ 22.414852] R13: ffffffff8235ebc0 R14: ffffffffa00060c0 R15: 0000000000000000
    [ 22.415941] FS: 00000000019d48c0(0000) GS:ffff88807d7c0000(0000) knlGS:0000000000000000
    [ 22.417264] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 22.418176] CR2: 00000000005bb7e3 CR3: 0000000078f7a000 CR4: 00000000000006a0
    [ 22.419309] Call Trace:
    [ 22.419990] kill_kprobe+0x94/0x160
    [ 22.420652] kprobes_module_callback+0x64/0x230
    [ 22.421470] notifier_call_chain+0x4f/0x70
    [ 22.422184] blocking_notifier_call_chain+0x49/0x70
    [ 22.422979] __x64_sys_delete_module+0x1ac/0x240
    [ 22.423733] do_syscall_64+0x38/0x50
    [ 22.424366] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 22.425176] RIP: 0033:0x4bb81d
    [ 22.425741] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e0 ff ff ff f7 d8 64 89 01 48
    [ 22.428726] RSP: 002b:00007ffc70fef008 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
    [ 22.430169] RAX: ffffffffffffffda RBX: 00000000019d48a0 RCX: 00000000004bb81d
    [ 22.431375] RDX: 0000000000000000 RSI: 0000000000000880 RDI: 00007ffc70fef028
    [ 22.432543] RBP: 0000000000000880 R08: 00000000ffffffff R09: 00007ffc70fef320
    [ 22.433692] R10: 0000000000656300 R11: 0000000000000246 R12: 00007ffc70fef028
    [ 22.434635] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
    [ 22.435682] irq event stamp: 1169
    [ 22.436240] hardirqs last enabled at (1179): [] console_unlock+0x422/0x580
    [ 22.437466] hardirqs last disabled at (1188): [] console_unlock+0x7b/0x580
    [ 22.438608] softirqs last enabled at (866): [] __do_softirq+0x38e/0x490
    [ 22.439637] softirqs last disabled at (859): [] asm_call_on_stack+0x12/0x20
    [ 22.440690] ---[ end trace 1e7ce7e1e4567276 ]---
    [ 22.472832] trace_kprobe: This probe might be able to register after target module is loaded. Continue.

    This is because the kill_kprobe() calls disarm_kprobe_ftrace() even
    if the given probe is not enabled. In that case, ftrace_set_filter_ip()
    fails because the given probe point is not registered to ftrace.

    Fix to check the given (going) probe is enabled before invoking
    disarm_kprobe_ftrace().

    Link: https://lkml.kernel.org/r/159888672694.1411785.5987998076694782591.stgit@devnote2

    Fixes: 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler")
    Cc: Ingo Molnar
    Cc: "Naveen N . Rao"
    Cc: Anil S Keshavamurthy
    Cc: David Miller
    Cc: Muchun Song
    Cc: Chengming Zhou
    Cc: stable@vger.kernel.org
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

14 Sep, 2020

1 commit

  • Commit:

    0cb2f1372baa ("kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler")

    fixed one bug but the underlying bugs are not completely fixed yet.

    If we run a kprobe_module.tc of ftracetest, a warning triggers:

    # ./ftracetest test.d/kprobe/kprobe_module.tc
    === Ftrace unit tests ===
    [1] Kprobe dynamic event - probing module
    ...
    ------------[ cut here ]------------
    Failed to disarm kprobe-ftrace at trace_printk_irq_work+0x0/0x7e [trace_printk] (-2)
    WARNING: CPU: 7 PID: 200 at kernel/kprobes.c:1091 __disarm_kprobe_ftrace.isra.0+0x7e/0xa0

    This is because the kill_kprobe() calls disarm_kprobe_ftrace() even
    if the given probe is not enabled. In that case, ftrace_set_filter_ip()
    fails because the given probe point is not registered to ftrace.

    Fix to check the given (going) probe is enabled before invoking
    disarm_kprobe_ftrace().

    Fixes: 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler")
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Ingo Molnar
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/159888672694.1411785.5987998076694782591.stgit@devnote2

    Masami Hiramatsu
     

08 Sep, 2020

4 commits

  • Since we unified the kretprobe trampoline handler from arch/* code,
    some functions and objects do not need to be exported anymore.

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Ingo Molnar
    Link: https://lore.kernel.org/r/159870618256.1229682.8692046612635810882.stgit@devnote2

    Masami Hiramatsu
     
  • Free kretprobe_instance with RCU callback instead of directly
    freeing the object in the kretprobe handler context.

    This will make kretprobe run safer in NMI context.

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Ingo Molnar
    Link: https://lore.kernel.org/r/159870616685.1229682.11978742048709542226.stgit@devnote2

    Masami Hiramatsu
     
  • The in_nmi() check in pre_handler_kretprobe() is meant to avoid
    recursion, and blindly assumes that anything NMI is recursive.

    However, since commit:

    9b38cc704e84 ("kretprobe: Prevent triggering kretprobe from within kprobe_flush_task")

    there is a better way to detect and avoid actual recursion.

    By setting a dummy kprobe, any actual exceptions will terminate early
    (by trying to handle the dummy kprobe), and recursion will not happen.

    Employ this to avoid the kretprobe_table_lock() recursion, replacing
    the over-eager in_nmi() check.

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Link: https://lkml.kernel.org/r/159870615628.1229682.6087311596892125907.stgit@devnote2

    Masami Hiramatsu
     
  • Add a generic kretprobe trampoline handler for unifying
    the all cloned /arch/* kretprobe trampoline handlers.

    The generic kretprobe trampoline handler is based on the
    x86 implementation, because it is the latest implementation.
    It has frame pointer checking, kprobe_busy_begin/end and
    return address fixup for user handlers.

    [ mingo: Minor edits. ]

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Ingo Molnar
    Link: https://lore.kernel.org/r/159870600138.1229682.3424065380448088833.stgit@devnote2

    Masami Hiramatsu
     

01 Sep, 2020

1 commit


08 Aug, 2020

1 commit

  • Pull tracing updates from Steven Rostedt:

    - The biggest news in that the tracing ring buffer can now time events
    that interrupted other ring buffer events.

    Before this change, if an interrupt came in while recording another
    event, and that interrupt also had an event, those events would all
    have the same time stamp as the event it interrupted.

    Now, with the new design, those events will have a unique time stamp
    and rightfully display the time for those events that were recorded
    while interrupting another event.

    - Bootconfig how has an "override" operator that lets the users have a
    default config, but then add options to override the default.

    - A fix was made to properly filter function graph tracing to the
    ftrace PIDs. This came in at the end of the -rc cycle, and needs to
    be backported.

    - Several clean ups, performance updates, and minor fixes as well.

    * tag 'trace-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (39 commits)
    tracing: Add trace_array_init_printk() to initialize instance trace_printk() buffers
    kprobes: Fix compiler warning for !CONFIG_KPROBES_ON_FTRACE
    tracing: Use trace_sched_process_free() instead of exit() for pid tracing
    bootconfig: Fix to find the initargs correctly
    Documentation: bootconfig: Add bootconfig override operator
    tools/bootconfig: Add testcases for value override operator
    lib/bootconfig: Add override operator support
    kprobes: Remove show_registers() function prototype
    tracing/uprobe: Remove dead code in trace_uprobe_register()
    kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler
    ftrace: Fix ftrace_trace_task return value
    tracepoint: Use __used attribute definitions from compiler_attributes.h
    tracepoint: Mark __tracepoint_string's __used
    trace : Have tracing buffer info use kvzalloc instead of kzalloc
    tracing: Remove outdated comment in stack handling
    ftrace: Do not let direct or IPMODIFY ftrace_ops be added to module and set trampolines
    ftrace: Setup correct FTRACE_FL_REGS flags for module
    tracing/hwlat: Honor the tracing_cpumask
    tracing/hwlat: Drop the duplicate assignment in start_kthread()
    tracing: Save one trace_event->type by using __TRACE_LAST_TYPE
    ...

    Linus Torvalds
     

06 Aug, 2020

1 commit

  • Fix compiler warning(as show below) for !CONFIG_KPROBES_ON_FTRACE.

    kernel/kprobes.c: In function 'kill_kprobe':
    kernel/kprobes.c:1116:33: warning: statement with no effect
    [-Wunused-value]
    1116 | #define disarm_kprobe_ftrace(p) (-ENODEV)
    | ^
    kernel/kprobes.c:2154:3: note: in expansion of macro
    'disarm_kprobe_ftrace'
    2154 | disarm_kprobe_ftrace(p);

    Link: https://lore.kernel.org/r/20200805142136.0331f7ea@canb.auug.org.au
    Link: https://lkml.kernel.org/r/20200805172046.19066-1-songmuchun@bytedance.com

    Reported-by: Stephen Rothwell
    Fixes: 0cb2f1372baa ("kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler")
    Acked-by: Masami Hiramatsu
    Acked-by: John Fastabend
    Signed-off-by: Muchun Song
    Signed-off-by: Steven Rostedt (VMware)

    Muchun Song
     

04 Aug, 2020

1 commit

  • We found a case of kernel panic on our server. The stack trace is as
    follows(omit some irrelevant information):

    BUG: kernel NULL pointer dereference, address: 0000000000000080
    RIP: 0010:kprobe_ftrace_handler+0x5e/0xe0
    RSP: 0018:ffffb512c6550998 EFLAGS: 00010282
    RAX: 0000000000000000 RBX: ffff8e9d16eea018 RCX: 0000000000000000
    RDX: ffffffffbe1179c0 RSI: ffffffffc0535564 RDI: ffffffffc0534ec0
    RBP: ffffffffc0534ec1 R08: ffff8e9d1bbb0f00 R09: 0000000000000004
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    R13: ffff8e9d1f797060 R14: 000000000000bacc R15: ffff8e9ce13eca00
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000080 CR3: 00000008453d0005 CR4: 00000000003606e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:

    ftrace_ops_assist_func+0x56/0xe0
    ftrace_call+0x5/0x34
    tcpa_statistic_send+0x5/0x130 [ttcp_engine]

    The tcpa_statistic_send is the function being kprobed. After analysis,
    the root cause is that the fourth parameter regs of kprobe_ftrace_handler
    is NULL. Why regs is NULL? We use the crash tool to analyze the kdump.

    crash> dis tcpa_statistic_send -r
    : callq 0xffffffffbd8018c0

    The tcpa_statistic_send calls ftrace_caller instead of ftrace_regs_caller.
    So it is reasonable that the fourth parameter regs of kprobe_ftrace_handler
    is NULL. In theory, we should call the ftrace_regs_caller instead of the
    ftrace_caller. After in-depth analysis, we found a reproducible path.

    Writing a simple kernel module which starts a periodic timer. The
    timer's handler is named 'kprobe_test_timer_handler'. The module
    name is kprobe_test.ko.

    1) insmod kprobe_test.ko
    2) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}'
    3) echo 0 > /proc/sys/kernel/ftrace_enabled
    4) rmmod kprobe_test
    5) stop step 2) kprobe
    6) insmod kprobe_test.ko
    7) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}'

    We mark the kprobe as GONE but not disarm the kprobe in the step 4).
    The step 5) also do not disarm the kprobe when unregister kprobe. So
    we do not remove the ip from the filter. In this case, when the module
    loads again in the step 6), we will replace the code to ftrace_caller
    via the ftrace_module_enable(). When we register kprobe again, we will
    not replace ftrace_caller to ftrace_regs_caller because the ftrace is
    disabled in the step 3). So the step 7) will trigger kernel panic. Fix
    this problem by disarming the kprobe when the module is going away.

    Link: https://lkml.kernel.org/r/20200728064536.24405-1-songmuchun@bytedance.com

    Cc: stable@vger.kernel.org
    Fixes: ae6aa16fdc16 ("kprobes: introduce ftrace based optimization")
    Acked-by: Masami Hiramatsu
    Signed-off-by: Muchun Song
    Co-developed-by: Chengming Zhou
    Signed-off-by: Chengming Zhou
    Signed-off-by: Steven Rostedt (VMware)

    Muchun Song
     

28 Jul, 2020

2 commits

  • Since we already lock both kprobe_mutex and text_mutex in the optimizer,
    text will not be changed and the module unloading will be stopped
    inside kprobes_module_callback().

    The mutex_lock() has originally been introduced to avoid conflict with text modification,
    at that point we didn't hold text_mutex.

    But after:

    f1c6ece23729 ("kprobes: Fix potential deadlock in kprobe_optimizer()")

    We started holding the text_mutex and don't need the modules mutex anyway.

    So remove the module_mutex locking.

    [ mingo: Amended the changelog. ]

    Suggested-by: Ingo Molnar
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Ingo Molnar
    Cc: Jarkko Sakkinen
    Link: https://lore.kernel.org/r/20200728163400.e00b09c594763349f99ce6cb@kernel.org

    Masami Hiramatsu
     
  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     

09 Jul, 2020

2 commits

  • The kprobe show() functions were using "current"'s creds instead
    of the file opener's creds for kallsyms visibility. Fix to use
    seq_file->file->f_cred.

    Cc: Masami Hiramatsu
    Cc: stable@vger.kernel.org
    Fixes: 81365a947de4 ("kprobes: Show address of kprobes if kallsyms does")
    Fixes: ffb9bd68ebdb ("kprobes: Show blacklist addresses as same as kallsyms does")
    Signed-off-by: Kees Cook

    Kees Cook
     
  • In order to perform future tests against the cred saved during open(),
    switch kallsyms_show_value() to operate on a cred, and have all current
    callers pass current_cred(). This makes it very obvious where callers
    are checking the wrong credential in their "read" contexts. These will
    be fixed in the coming patches.

    Additionally switch return value to bool, since it is always used as a
    direct permission check, not a 0-on-success, negative-on-error style
    function return.

    Cc: stable@vger.kernel.org
    Signed-off-by: Kees Cook

    Kees Cook
     

02 Jul, 2020

1 commit


17 Jun, 2020

5 commits

  • Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
    My test was also able to trigger lockdep output:

    ============================================
    WARNING: possible recursive locking detected
    5.6.0-rc6+ #6 Not tainted
    --------------------------------------------
    sched-messaging/2767 is trying to acquire lock:
    ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0

    but task is already holding lock:
    ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(&(kretprobe_table_locks[i].lock));
    lock(&(kretprobe_table_locks[i].lock));

    *** DEADLOCK ***

    May be due to missing lock nesting notation

    1 lock held by sched-messaging/2767:
    #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

    stack backtrace:
    CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
    Call Trace:
    dump_stack+0x96/0xe0
    __lock_acquire.cold.57+0x173/0x2b7
    ? native_queued_spin_lock_slowpath+0x42b/0x9e0
    ? lockdep_hardirqs_on+0x590/0x590
    ? __lock_acquire+0xf63/0x4030
    lock_acquire+0x15a/0x3d0
    ? kretprobe_hash_lock+0x52/0xa0
    _raw_spin_lock_irqsave+0x36/0x70
    ? kretprobe_hash_lock+0x52/0xa0
    kretprobe_hash_lock+0x52/0xa0
    trampoline_handler+0xf8/0x940
    ? kprobe_fault_handler+0x380/0x380
    ? find_held_lock+0x3a/0x1c0
    kretprobe_trampoline+0x25/0x50
    ? lock_acquired+0x392/0xbc0
    ? _raw_spin_lock_irqsave+0x50/0x70
    ? __get_valid_kprobe+0x1f0/0x1f0
    ? _raw_spin_unlock_irqrestore+0x3b/0x40
    ? finish_task_switch+0x4b9/0x6d0
    ? __switch_to_asm+0x34/0x70
    ? __switch_to_asm+0x40/0x70

    The code within the kretprobe handler checks for probe reentrancy,
    so we won't trigger any _raw_spin_lock_irqsave probe in there.

    The problem is in outside kprobe_flush_task, where we call:

    kprobe_flush_task
    kretprobe_table_lock
    raw_spin_lock_irqsave
    _raw_spin_lock_irqsave

    where _raw_spin_lock_irqsave triggers the kretprobe and installs
    kretprobe_trampoline handler on _raw_spin_lock_irqsave return.

    The kretprobe_trampoline handler is then executed with already
    locked kretprobe_table_locks, and first thing it does is to
    lock kretprobe_table_locks ;-) the whole lockup path like:

    kprobe_flush_task
    kretprobe_table_lock
    raw_spin_lock_irqsave
    _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed

    ---> kretprobe_table_locks locked

    kretprobe_trampoline
    trampoline_handler
    kretprobe_hash_lock(current, &head, &flags);
    Cc: "Gustavo A . R . Silva"
    Cc: Anders Roxell
    Cc: "Naveen N . Rao"
    Cc: Anil S Keshavamurthy
    Cc: David Miller
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: stable@vger.kernel.org
    Reported-by: "Ziqian SUN (Zamir)"
    Acked-by: Masami Hiramatsu
    Signed-off-by: Jiri Olsa
    Signed-off-by: Steven Rostedt (VMware)

    Jiri Olsa
     
  • Fix to remove redundant arch_disarm_kprobe() call in
    force_unoptimize_kprobe(). This arch_disarm_kprobe()
    will be invoked if the kprobe is optimized but disabled,
    but that means the kprobe (optprobe) is unused (and
    unoptimized) state.

    In that case, unoptimize_kprobe() puts it in freeing_list
    and kprobe_optimizer (do_unoptimize_kprobes()) automatically
    disarm it. Thus this arch_disarm_kprobe() is redundant.

    Link: http://lkml.kernel.org/r/158927058719.27680.17183632908465341189.stgit@devnote2

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • In kprobe_optimizer() kick_kprobe_optimizer() is called
    without kprobe_mutex, but this can race with other caller
    which is protected by kprobe_mutex.

    To fix that, expand kprobe_mutex protected area to protect
    kick_kprobe_optimizer() call.

    Link: http://lkml.kernel.org/r/158927057586.27680.5036330063955940456.stgit@devnote2

    Fixes: cd7ebe2298ff ("kprobes: Use text_poke_smp_batch for optimizing")
    Cc: Ingo Molnar
    Cc: "Gustavo A . R . Silva"
    Cc: Anders Roxell
    Cc: "Naveen N . Rao"
    Cc: Anil S Keshavamurthy
    Cc: David Miller
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Ziqian SUN
    Cc: stable@vger.kernel.org
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Current kprobes uses RCU traversal APIs on kprobe_tables
    even if it is safe because kprobe_mutex is locked.

    Make those traversals to non-RCU APIs where the kprobe_mutex
    is locked.

    Link: http://lkml.kernel.org/r/158927056452.27680.9710575332163005121.stgit@devnote2

    Reviewed-by: Joel Fernandes (Google)
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     
  • Anders reported that the lockdep warns that suspicious
    RCU list usage in register_kprobe() (detected by
    CONFIG_PROVE_RCU_LIST.) This is because get_kprobe()
    access kprobe_table[] by hlist_for_each_entry_rcu()
    without rcu_read_lock.

    If we call get_kprobe() from the breakpoint handler context,
    it is run with preempt disabled, so this is not a problem.
    But in other cases, instead of rcu_read_lock(), we locks
    kprobe_mutex so that the kprobe_table[] is not updated.
    So, current code is safe, but still not good from the view
    point of RCU.

    Joel suggested that we can silent that warning by passing
    lockdep_is_held() to the last argument of
    hlist_for_each_entry_rcu().

    Add lockdep_is_held(&kprobe_mutex) at the end of the
    hlist_for_each_entry_rcu() to suppress the warning.

    Link: http://lkml.kernel.org/r/158927055350.27680.10261450713467997503.stgit@devnote2

    Reported-by: Anders Roxell
    Suggested-by: Joel Fernandes
    Reviewed-by: Joel Fernandes (Google)
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

15 Jun, 2020

2 commits

  • Symbols are needed for tools to describe instruction addresses. Pages
    allocated for kprobe's purposes need symbols to be created for them.
    Add such symbols to be visible via perf ksymbol events.

    Signed-off-by: Adrian Hunter
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Peter Zijlstra (Intel)
    Acked-by: Masami Hiramatsu
    Link: https://lkml.kernel.org/r/20200512121922.8997-5-adrian.hunter@intel.com

    Adrian Hunter
     
  • Symbols are needed for tools to describe instruction addresses. Pages
    allocated for kprobe's purposes need symbols to be created for them.
    Add such symbols to be visible via /proc/kallsyms.

    Note: kprobe insn pages are not used if ftrace is configured. To see the
    effect of this patch, the kernel must be configured with:

    # CONFIG_FUNCTION_TRACER is not set
    CONFIG_KPROBES=y

    and for optimised kprobes:

    CONFIG_OPTPROBES=y

    Example on x86:

    # perf probe __schedule
    Added new event:
    probe:__schedule (on __schedule)
    # cat /proc/kallsyms | grep '\[__builtin__kprobes\]'
    ffffffffc00d4000 t kprobe_insn_page [__builtin__kprobes]
    ffffffffc00d6000 t kprobe_optinsn_page [__builtin__kprobes]

    Note: This patch adds "__builtin__kprobes" as a module name in
    /proc/kallsyms for symbols for pages allocated for kprobes' purposes, even
    though "__builtin__kprobes" is not a module.

    Signed-off-by: Adrian Hunter
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Masami Hiramatsu
    Link: https://lkml.kernel.org/r/20200528080058.20230-1-adrian.hunter@intel.com

    Adrian Hunter
     

05 Jun, 2020

1 commit

  • Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.

    Signed-off-by: Kefeng Wang
    Signed-off-by: Andrew Morton
    Cc: Anil S Keshavamurthy
    Cc: "David S. Miller"
    Cc: Masami Hiramatsu
    Cc: Greg KH
    Cc: Ingo Molnar
    Cc: Al Viro
    Link: http://lkml.kernel.org/r/20200509064031.181091-4-wangkefeng.wang@huawei.com
    Signed-off-by: Linus Torvalds

    Kefeng Wang
     

04 Jun, 2020

1 commit

  • Pull networking updates from David Miller:

    1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
    Augusto von Dentz.

    2) Add GSO partial support to igc, from Sasha Neftin.

    3) Several cleanups and improvements to r8169 from Heiner Kallweit.

    4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
    device self-test. From Andrew Lunn.

    5) Start moving away from custom driver versions, use the globally
    defined kernel version instead, from Leon Romanovsky.

    6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.

    7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.

    8) Add sriov and vf support to hinic, from Luo bin.

    9) Support Media Redundancy Protocol (MRP) in the bridging code, from
    Horatiu Vultur.

    10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.

    11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
    Dubroca. Also add ipv6 support for espintcp.

    12) Lots of ReST conversions of the networking documentation, from Mauro
    Carvalho Chehab.

    13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
    from Doug Berger.

    14) Allow to dump cgroup id and filter by it in inet_diag code, from
    Dmitry Yakunin.

    15) Add infrastructure to export netlink attribute policies to
    userspace, from Johannes Berg.

    16) Several optimizations to sch_fq scheduler, from Eric Dumazet.

    17) Fallback to the default qdisc if qdisc init fails because otherwise
    a packet scheduler init failure will make a device inoperative. From
    Jesper Dangaard Brouer.

    18) Several RISCV bpf jit optimizations, from Luke Nelson.

    19) Correct the return type of the ->ndo_start_xmit() method in several
    drivers, it's netdev_tx_t but many drivers were using
    'int'. From Yunjian Wang.

    20) Add an ethtool interface for PHY master/slave config, from Oleksij
    Rempel.

    21) Add BPF iterators, from Yonghang Song.

    22) Add cable test infrastructure, including ethool interfaces, from
    Andrew Lunn. Marvell PHY driver is the first to support this
    facility.

    23) Remove zero-length arrays all over, from Gustavo A. R. Silva.

    24) Calculate and maintain an explicit frame size in XDP, from Jesper
    Dangaard Brouer.

    25) Add CAP_BPF, from Alexei Starovoitov.

    26) Support terse dumps in the packet scheduler, from Vlad Buslov.

    27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.

    28) Add devm_register_netdev(), from Bartosz Golaszewski.

    29) Minimize qdisc resets, from Cong Wang.

    30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
    eliminate set_fs/get_fs calls. From Christoph Hellwig.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
    selftests: net: ip_defrag: ignore EPERM
    net_failover: fixed rollback in net_failover_open()
    Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
    Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
    vmxnet3: allow rx flow hash ops only when rss is enabled
    hinic: add set_channels ethtool_ops support
    selftests/bpf: Add a default $(CXX) value
    tools/bpf: Don't use $(COMPILE.c)
    bpf, selftests: Use bpf_probe_read_kernel
    s390/bpf: Use bcr 0,%0 as tail call nop filler
    s390/bpf: Maintain 8-byte stack alignment
    selftests/bpf: Fix verifier test
    selftests/bpf: Fix sample_cnt shared between two threads
    bpf, selftests: Adapt cls_redirect to call csum_level helper
    bpf: Add csum_level helper for fixing up csum levels
    bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
    sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
    crypto/chtls: IPv6 support for inline TLS
    Crypto/chcr: Fixes a coccinile check error
    Crypto/chcr: Fixes compilations warnings
    ...

    Linus Torvalds
     

19 May, 2020

1 commit


12 May, 2020

3 commits

  • Support NOKPROBE_SYMBOL() in modules. NOKPROBE_SYMBOL() records only symbol
    address in "_kprobe_blacklist" section in the module.

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexandre Chartre
    Acked-by: Peter Zijlstra
    Link: https://lkml.kernel.org/r/20200505134059.771170126@linutronix.de

    Masami Hiramatsu
     
  • Support __kprobes attribute for blacklist functions in modules. The
    __kprobes attribute functions are stored in .kprobes.text section.

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexandre Chartre
    Acked-by: Peter Zijlstra
    Link: https://lkml.kernel.org/r/20200505134059.678201813@linutronix.de

    Masami Hiramatsu
     
  • Lock kprobe_mutex while showing kprobe_blacklist to prevent updating the
    kprobe_blacklist.

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Alexandre Chartre
    Acked-by: Peter Zijlstra
    Link: https://lkml.kernel.org/r/20200505134059.571125195@linutronix.de

    Masami Hiramatsu
     

27 Apr, 2020

1 commit

  • Instead of having all the sysctl handlers deal with user pointers, which
    is rather hairy in terms of the BPF interaction, copy the input to and
    from userspace in common code. This also means that the strings are
    always NUL-terminated by the common code, making the API a little bit
    safer.

    As most handler just pass through the data to one of the common handlers
    a lot of the changes are mechnical.

    Signed-off-by: Christoph Hellwig
    Acked-by: Andrey Ignatov
    Signed-off-by: Al Viro

    Christoph Hellwig
     

09 Jan, 2020

1 commit

  • optimize_kprobe() and unoptimize_kprobe() cancels if a given kprobe
    is on the optimizing_list or unoptimizing_list already. However, since
    the following commit:

    f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code")

    modified the update timing of the KPROBE_FLAG_OPTIMIZED, it doesn't
    work as expected anymore.

    The optimized_kprobe could be in the following states:

    - [optimizing]: Before inserting jump instruction
    op.kp->flags has KPROBE_FLAG_OPTIMIZED and
    op->list is not empty.

    - [optimized]: jump inserted
    op.kp->flags has KPROBE_FLAG_OPTIMIZED and
    op->list is empty.

    - [unoptimizing]: Before removing jump instruction (including unused
    optprobe)
    op.kp->flags has KPROBE_FLAG_OPTIMIZED and
    op->list is not empty.

    - [unoptimized]: jump removed
    op.kp->flags doesn't have KPROBE_FLAG_OPTIMIZED and
    op->list is empty.

    Current code mis-expects [unoptimizing] state doesn't have
    KPROBE_FLAG_OPTIMIZED, and that can cause incorrect results.

    To fix this, introduce optprobe_queued_unopt() to distinguish [optimizing]
    and [unoptimizing] states and fixes the logic in optimize_kprobe() and
    unoptimize_kprobe().

    [ mingo: Cleaned up the changelog and the code a bit. ]

    Signed-off-by: Masami Hiramatsu
    Reviewed-by: Steven Rostedt (VMware)
    Cc: Alexei Starovoitov
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: bristot@redhat.com
    Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code")
    Link: https://lkml.kernel.org/r/157840814418.7181.13478003006386303481.stgit@devnote2
    Signed-off-by: Ingo Molnar

    Masami Hiramatsu
     

27 Nov, 2019

1 commit

  • Set the unoptimized flag after confirming the code is completely
    unoptimized. Without this fix, when a kprobe hits the intermediate
    modified instruction (the first byte is replaced by an INT3, but
    later bytes can still be a jump address operand) while unoptimizing,
    it can return to the middle byte of the modified code, which causes
    an invalid instruction exception in the kernel.

    Usually, this is a rare case, but if we put a probe on the function
    call while text patching, it always causes a kernel panic as below:

    # echo p text_poke+5 > kprobe_events
    # echo 1 > events/kprobes/enable
    # echo 0 > events/kprobes/enable

    invalid opcode: 0000 [#1] PREEMPT SMP PTI
    RIP: 0010:text_poke+0x9/0x50
    Call Trace:
    arch_unoptimize_kprobe+0x22/0x28
    arch_unoptimize_kprobes+0x39/0x87
    kprobe_optimizer+0x6e/0x290
    process_one_work+0x2a0/0x610
    worker_thread+0x28/0x3d0
    ? process_one_work+0x610/0x610
    kthread+0x10d/0x130
    ? kthread_park+0x80/0x80
    ret_from_fork+0x3a/0x50

    text_poke() is used for patching the code in optprobes.

    This can happen even if we blacklist text_poke() and other functions,
    because there is a small time window during which we show the intermediate
    code to other CPUs.

    [ mingo: Edited the changelog. ]

    Tested-by: Alexei Starovoitov
    Signed-off-by: Masami Hiramatsu
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: bristot@redhat.com
    Fixes: 6274de4984a6 ("kprobes: Support delayed unoptimizing")
    Link: https://lkml.kernel.org/r/157483422375.25881.13508326028469515760.stgit@devnote2
    Signed-off-by: Ingo Molnar

    Masami Hiramatsu
     

21 Sep, 2019

1 commit

  • Pull tracing updates from Steven Rostedt:

    - Addition of multiprobes to kprobe and uprobe events (allows for more
    than one probe attached to the same location)

    - Addition of adding immediates to probe parameters

    - Clean up of the recordmcount.c code. This brings us closer to merging
    recordmcount into objtool, and reuse code.

    - Other small clean ups

    * tag 'trace-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
    selftests/ftrace: Update kprobe event error testcase
    tracing/probe: Reject exactly same probe event
    tracing/probe: Fix to allow user to enable events on unloaded modules
    selftests/ftrace: Select an existing function in kprobe_eventname test
    tracing/kprobe: Fix NULL pointer access in trace_porbe_unlink()
    tracing: Make sure variable reference alias has correct var_ref_idx
    tracing: Be more clever when dumping hex in __print_hex()
    ftrace: Simplify ftrace hash lookup code in clear_func_from_hash()
    tracing: Add "gfp_t" support in synthetic_events
    tracing: Rename tracing_reset() to tracing_reset_cpu()
    tracing: Document the stack trace algorithm in the comments
    tracing/arm64: Have max stack tracer handle the case of return address after data
    recordmcount: Clarify what cleanup() does
    recordmcount: Remove redundant cleanup() calls
    recordmcount: Kernel style formatting
    recordmcount: Kernel style function signature formatting
    recordmcount: Rewrite error/success handling
    selftests/ftrace: Add syntax error test for multiprobe
    selftests/ftrace: Add syntax error test for immediates
    selftests/ftrace: Add a testcase for kprobe multiprobe event
    ...

    Linus Torvalds