31 Aug, 2020

1 commit

  • In the core runtime, we must minimize any calls to external library
    functions to avoid any kind of recursion. This can happen even though
    instrumentation is disabled for called functions, but tracing is
    enabled.

    Most recently, prandom_u32() added a tracepoint, which can cause
    problems for KCSAN even if the rcuidle variant is used. For example:
    kcsan -> prandom_u32() -> trace_prandom_u32_rcuidle ->
    srcu_read_lock_notrace -> __srcu_read_lock -> kcsan ...

    While we could disable KCSAN in kcsan_setup_watchpoint(), this does not
    solve other unexpected behaviour we may get due recursing into functions
    that may not be tolerant to such recursion:
    __srcu_read_lock -> kcsan -> ... -> __srcu_read_lock

    Therefore, switch to using prandom_u32_state(), which is uninstrumented,
    and does not have a tracepoint.

    Link: https://lkml.kernel.org/r/20200821063043.1949509-1-elver@google.com
    Link: https://lkml.kernel.org/r/20200820172046.GA177701@elver.google.com
    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney

    Marco Elver
     

25 Aug, 2020

6 commits

  • Remove kcsan_counter_inc/dec() functions, as they perform no other
    logic, and are no longer needed.

    This avoids several calls in kcsan_setup_watchpoint() and
    kcsan_found_watchpoint(), as well as lets the compiler warn us about
    potential out-of-bounds accesses as the array's size is known at all
    usage sites at compile-time.

    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney

    Marco Elver
     
  • Show a message in the kernel log if KCSAN was enabled early.

    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney

    Marco Elver
     
  • Add missing CONFIG_KCSAN_IGNORE_ATOMICS checks for the builtin atomics
    instrumentation.

    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney

    Marco Elver
     
  • For compound instrumentation and assert accesses, skew the watchpoint
    delay to be longer if randomized. This is useful to improve race
    detection for such accesses.

    For compound accesses we should increase the delay as we've aggregated
    both read and write instrumentation. By giving up 1 call into the
    runtime, we're less likely to set up a watchpoint and thus less likely
    to detect a race. We can balance this by increasing the watchpoint
    delay.

    For assert accesses, we know these are of increased interest, and we
    wish to increase our chances of detecting races for such checks.

    Note that, kcsan_udelay_{task,interrupt} define the upper bound delays.
    When randomized, delays are uniformly distributed between [0, delay].
    Skewing the delay does not break this promise as long as the defined
    upper bounds are still adhered to. The current skew results in delays
    uniformly distributed between [delay/2, delay].

    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney

    Marco Elver
     
  • Add support for compounded read-write instrumentation if supported by
    the compiler. Adds the necessary instrumentation functions, and a new
    type which is used to generate a more descriptive report.

    Furthermore, such compounded memory access instrumentation is excluded
    from the "assume aligned writes up to word size are atomic" rule,
    because we cannot assume that the compiler emits code that is atomic for
    compound ops.

    LLVM/Clang added support for the feature in:
    https://github.com/llvm/llvm-project/commit/785d41a261d136b64ab6c15c5d35f2adc5ad53e3

    The new instrumentation is emitted for sets of memory accesses in the
    same basic block to the same address with at least one read appearing
    before a write. These typically result from compound operations such as
    ++, --, +=, -=, |=, &=, etc. but also equivalent forms such as "var =
    var + 1". Where the compiler determines that it is equivalent to emit a
    call to a single __tsan_read_write instead of separate __tsan_read and
    __tsan_write, we can then benefit from improved performance and better
    reporting for such access patterns.

    The new reports now show that the ops are both reads and writes, for
    example:

    read-write to 0xffffffff90548a38 of 8 bytes by task 143 on cpu 3:
    test_kernel_rmw_array+0x45/0xa0
    access_thread+0x71/0xb0
    kthread+0x21e/0x240
    ret_from_fork+0x22/0x30

    read-write to 0xffffffff90548a38 of 8 bytes by task 144 on cpu 2:
    test_kernel_rmw_array+0x45/0xa0
    access_thread+0x71/0xb0
    kthread+0x21e/0x240
    ret_from_fork+0x22/0x30

    Acked-by: Peter Zijlstra (Intel)
    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney

    Marco Elver
     
  • Some architectures (currently e.g. s390 partially) implement atomics
    using the compiler's atomic builtins (__atomic_*, __sync_*). To support
    enabling KCSAN on such architectures in future, or support experimental
    use of these builtins, implement support for them.

    We should also avoid breaking KCSAN kernels due to use (accidental or
    otherwise) of atomic builtins in drivers, as has happened in the past:
    https://lkml.kernel.org/r/5231d2c0-41d9-6721-e15f-a7eedf3ce69e@infradead.org

    The instrumentation is subtly different from regular reads/writes: TSAN
    instrumentation replaces the use of atomic builtins with a call into the
    runtime, and the runtime's job is to also execute the desired atomic
    operation. We rely on the __atomic_* compiler builtins, available with
    all KCSAN-supported compilers, to implement each TSAN atomic
    instrumentation function.

    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney

    Marco Elver
     

01 Aug, 2020

1 commit


31 Jul, 2020

1 commit

  • To improve the general usefulness of the IRQ state trace events with
    KCSAN enabled, save and restore the trace information when entering and
    exiting the KCSAN runtime as well as when generating a KCSAN report.

    Without this, reporting the IRQ trace events (whether via a KCSAN report
    or outside of KCSAN via a lockdep report) is rather useless due to
    continuously being touched by KCSAN. This is because if KCSAN is
    enabled, every instrumented memory access causes changes to IRQ trace
    events (either by KCSAN disabling/enabling interrupts or taking
    report_lock when generating a report).

    Before "lockdep: Prepare for NMI IRQ state tracking", KCSAN avoided
    touching the IRQ trace events via raw_local_irq_save/restore() and
    lockdep_off/on().

    Fixes: 248591f5d257 ("kcsan: Make KCSAN compatible with new IRQ state tracking")
    Signed-off-by: Marco Elver
    Signed-off-by: Ingo Molnar
    Link: https://lore.kernel.org/r/20200729110916.3920464-2-elver@google.com
    Signed-off-by: Ingo Molnar

    Marco Elver
     

10 Jul, 2020

1 commit

  • The new IRQ state tracking code does not honor lockdep_off(), and as
    such we should again permit tracing by using non-raw functions in
    core.c. Update the lockdep_off() comment in report.c, to reflect the
    fact there is still a potential risk of deadlock due to using printk()
    from scheduler code.

    Suggested-by: Peter Zijlstra (Intel)
    Signed-off-by: Marco Elver
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Ingo Molnar
    Link: https://lkml.kernel.org/r/20200624113246.GA170324@elver.google.com

    Marco Elver
     

30 Jun, 2020

1 commit

  • The functions here should not be forward declared for explicit use
    elsewhere in the kernel, as they should only be emitted by the compiler
    due to sanitizer instrumentation. Add forward declarations a line above
    their definition to shut up warnings in W=1 builds.

    Link: https://lkml.kernel.org/r/202006060103.jSCpnV1g%lkp@intel.com
    Reported-by: kernel test robot
    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney

    Marco Elver
     

12 Jun, 2020

1 commit

  • In the kernel, the "volatile" keyword is used in various concurrent
    contexts, whether in low-level synchronization primitives or for
    legacy reasons. If supported by the compiler, it will be assumed
    that aligned volatile accesses up to sizeof(long long) (matching
    compiletime_assert_rwonce_type()) are atomic.

    Recent versions of Clang [1] (GCC tentative [2]) can instrument
    volatile accesses differently. Add the option (required) to enable the
    instrumentation, and provide the necessary runtime functions. None of
    the updated compilers are widely available yet (Clang 11 will be the
    first release to support the feature).

    [1] https://github.com/llvm/llvm-project/commit/5a2c31116f412c3b6888be361137efd705e05814
    [2] https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544452.html

    This change allows removing of any explicit checks in primitives such as
    READ_ONCE() and WRITE_ONCE().

    [ bp: Massage commit message a bit. ]

    Signed-off-by: Marco Elver
    Signed-off-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner
    Acked-by: Will Deacon
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200521142047.169334-4-elver@google.com

    Marco Elver
     

07 May, 2020

1 commit

  • The __kcsan_{enable,disable}_current() variants only call into KCSAN if
    KCSAN is enabled for the current compilation unit. Note: This is
    typically not what we want, as we usually want to ensure that even calls
    into other functions still have KCSAN disabled.

    These variants may safely be used in header files that are shared
    between regular kernel code and code that does not link the KCSAN
    runtime.

    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney

    Marco Elver
     

14 Apr, 2020

3 commits

  • This adds support for scoped accesses, where the memory range is checked
    for the duration of the scope. The feature is implemented by inserting
    the relevant access information into a list of scoped accesses for
    the current execution context, which are then checked (until removed)
    on every call (through instrumentation) into the KCSAN runtime.

    An alternative, more complex, implementation could set up a watchpoint for
    the scoped access, and keep the watchpoint set up. This, however, would
    require first exposing a handle to the watchpoint, as well as dealing
    with cases such as accesses by the same thread while the watchpoint is
    still set up (and several more cases). It is also doubtful if this would
    provide any benefit, since the majority of delay where the watchpoint
    is set up is likely due to the injected delays by KCSAN. Therefore,
    the implementation in this patch is simpler and avoids hurting KCSAN's
    main use-case (normal data race detection); it also implicitly increases
    scoped-access race-detection-ability due to increased probability of
    setting up watchpoints by repeatedly calling __kcsan_check_access()
    throughout the scope of the access.

    The implementation required adding an additional conditional branch to
    the fast-path. However, the microbenchmark showed a *speedup* of ~5%
    on the fast-path. This appears to be due to subtly improved codegen by
    GCC from moving get_ctx() and associated load of preempt_count earlier.

    Suggested-by: Boqun Feng
    Suggested-by: Paul E. McKenney
    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney

    Marco Elver
     
  • To avoid deadlock in case watchers can be interrupted, we need to ensure
    that producers of the struct other_info can never be blocked by an
    unrelated consumer. (Likely to occur with KCSAN_INTERRUPT_WATCHER.)

    There are several cases that can lead to this scenario, for example:

    1. A watchpoint A was set up by task T1, but interrupted by
    interrupt I1. Some other thread (task or interrupt) finds
    watchpoint A consumes it, and sets other_info. Then I1 also
    finds some unrelated watchpoint B, consumes it, but is blocked
    because other_info is in use. T1 cannot consume other_info
    because I1 never returns -> deadlock.

    2. A watchpoint A was set up by task T1, but interrupted by
    interrupt I1, which also sets up a watchpoint B. Some other
    thread finds watchpoint A, and consumes it and sets up
    other_info with its information. Similarly some other thread
    finds watchpoint B and consumes it, but is then blocked because
    other_info is in use. When I1 continues it sees its watchpoint
    was consumed, and that it must wait for other_info, which
    currently contains information to be consumed by T1. However, T1
    cannot unblock other_info because I1 never returns -> deadlock.

    To avoid this, we need to ensure that producers of struct other_info
    always have a usable other_info entry. This is obviously not the case
    with only a single instance of struct other_info, as concurrent
    producers must wait for the entry to be released by some consumer (which
    may be locked up as illustrated above).

    While it would be nice if producers could simply call kmalloc() and
    append their instance of struct other_info to a list, we are very
    limited in this code path: since KCSAN can instrument the allocators
    themselves, calling kmalloc() could lead to deadlock or corrupted
    allocator state.

    Since producers of the struct other_info will always succeed at
    try_consume_watchpoint(), preceding the call into kcsan_report(), we
    know that the particular watchpoint slot cannot simply be reused or
    consumed by another potential other_info producer. If we move removal of
    a watchpoint after reporting (by the consumer of struct other_info), we
    can see a consumed watchpoint as a held lock on elements of other_info,
    if we create a one-to-one mapping of a watchpoint to an other_info
    element.

    Therefore, the simplest solution is to create an array of struct
    other_info that is as large as the watchpoints array in core.c, and pass
    the watchpoint index to kcsan_report() for producers and consumers, and
    change watchpoints to be removed after reporting is done.

    With a default config on a 64-bit system, the array other_infos consumes
    ~37KiB. For most systems today this is not a problem. On smaller memory
    constrained systems, the config value CONFIG_KCSAN_NUM_WATCHPOINTS can
    be reduced appropriately.

    Overall, this change is a simplification of the prepare_report() code,
    and makes some of the checks (such as checking if at least one access is
    a write) redundant.

    Tested:
    $ tools/testing/selftests/rcutorture/bin/kvm.sh \
    --cpus 12 --duration 10 --kconfig "CONFIG_DEBUG_INFO=y \
    CONFIG_KCSAN=y CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n \
    CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n \
    CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000 CONFIG_KCSAN_VERBOSE=y \
    CONFIG_KCSAN_INTERRUPT_WATCHER=y CONFIG_PROVE_LOCKING=y" \
    --configs TREE03
    => No longer hangs and runs to completion as expected.

    Reported-by: Paul E. McKenney
    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney

    Marco Elver
     
  • Improve readability by introducing access_info and other_info structs,
    and in preparation of the following commit in this series replaces the
    single instance of other_info with an array of size 1.

    No functional change intended.

    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney

    Marco Elver
     

26 Mar, 2020

4 commits

  • s/slots slots/slots/

    Signed-off-by: Qiujun Huang
    Reviewed-by: Nick Desaulniers
    [elver: commit message]
    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney

    Qiujun Huang
     
  • Add volatile current->state to list of implicitly atomic accesses. This
    is in preparation to eventually enable KCSAN on kernel/sched (which
    currently still has KCSAN_SANITIZE := n).

    Since accesses that match the special check in atomic.h are rare, it
    makes more sense to move this check to the slow-path, avoiding the
    additional compare in the fast-path. With the microbenchmark, a speedup
    of ~6% is measured.

    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney

    Marco Elver
     
  • Adds CONFIG_KCSAN_VERBOSE to optionally enable more verbose reports.
    Currently information about the reporting task's held locks and IRQ
    trace events are shown, if they are enabled.

    Signed-off-by: Marco Elver
    Suggested-by: Qian Cai
    Signed-off-by: Paul E. McKenney

    Marco Elver
     
  • Add option to allow interrupts while a watchpoint is set up. This can be
    enabled either via CONFIG_KCSAN_INTERRUPT_WATCHER or via the boot
    parameter 'kcsan.interrupt_watcher=1'.

    Note that, currently not all safe per-CPU access primitives and patterns
    are accounted for, which could result in false positives. For example,
    asm-generic/percpu.h uses plain operations, which by default are
    instrumented. On interrupts and subsequent accesses to the same
    variable, KCSAN would currently report a data race with this option.

    Therefore, this option should currently remain disabled by default, but
    may be enabled for specific test scenarios.

    To avoid new warnings, changes all uses of smp_processor_id() to use the
    raw version (as already done in kcsan_found_watchpoint()). The exact SMP
    processor id is for informational purposes in the report, and
    correctness is not affected.

    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney

    Marco Elver
     

21 Mar, 2020

9 commits

  • When setting up an access mask with kcsan_set_access_mask(), KCSAN will
    only report races if concurrent changes to bits set in access_mask are
    observed. Conveying access_mask via a separate call avoids introducing
    overhead in the common-case fast-path.

    Acked-by: John Hubbard
    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar

    Marco Elver
     
  • Introduces kcsan_value_change type, which explicitly points out if we
    either observed a value-change (TRUE), or we could not observe one but
    cannot rule out a value-change happened (MAYBE). The MAYBE state can
    either be reported or not, depending on configuration preferences.

    A follow-up patch introduces the FALSE state, which should never be
    reported.

    No functional change intended.

    Acked-by: John Hubbard
    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar

    Marco Elver
     
  • This adds early_boot, udelay_{task,interrupt}, and skip_watch as module
    params. The latter parameters are useful to modify at runtime to tune
    KCSAN's performance on new systems. This will also permit auto-tuning
    these parameters to maximize overall system performance and KCSAN's race
    detection ability.

    None of the parameters are used in the fast-path and referring to them
    via static variables instead of CONFIG constants will not affect
    performance.

    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar
    Cc: Qian Cai

    Marco Elver
     
  • The KCSAN_ACCESS_ASSERT access type may be used to introduce dummy reads
    and writes to assert certain properties of concurrent code, where bugs
    could not be detected as normal data races.

    For example, a variable that is only meant to be written by a single
    CPU, but may be read (without locking) by other CPUs must still be
    marked properly to avoid data races. However, concurrent writes,
    regardless if WRITE_ONCE() or not, would be a bug. Using
    kcsan_check_access(&x, sizeof(x), KCSAN_ACCESS_ASSERT) would allow
    catching such bugs.

    To support KCSAN_ACCESS_ASSERT the following notable changes were made:

    * If an access is of type KCSAN_ASSERT_ACCESS, disable various filters
    that only apply to data races, so that all races that KCSAN observes are
    reported.
    * Bug reports that involve an ASSERT access type will be reported as
    "KCSAN: assert: race in ..." instead of "data-race"; this will help
    more easily distinguish them.
    * Update a few comments to just mention 'races' where we do not always
    mean pure data races.

    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar

    Marco Elver
     
  • Instrumentation of arbitrary memory-copy functions, such as user-copies,
    may be called with size of 0, which could lead to false positives.

    To avoid this, add a comparison in check_access() for size==0, which
    will be optimized out for constant sized instrumentation
    (__tsan_{read,write}N), and therefore not affect the common-case
    fast-path.

    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar

    Marco Elver
     
  • This adds option KCSAN_ASSUME_PLAIN_WRITES_ATOMIC. If enabled, plain
    aligned writes up to word size are assumed to be atomic, and also not
    subject to other unsafe compiler optimizations resulting in data races.

    This option has been enabled by default to reflect current kernel-wide
    preferences.

    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar

    Marco Elver
     
  • We must avoid any recursion into lockdep if KCSAN is enabled on utilities
    used by lockdep. One manifestation of this is corruption of lockdep's
    IRQ trace state (if TRACE_IRQFLAGS), resulting in spurious warnings
    (see below). This commit fixes this by:

    1. Using raw_local_irq{save,restore} in kcsan_setup_watchpoint().
    2. Disabling lockdep in kcsan_report().

    Tested with:

    CONFIG_LOCKDEP=y
    CONFIG_DEBUG_LOCKDEP=y
    CONFIG_TRACE_IRQFLAGS=y

    This fix eliminates spurious warnings such as the following one:

    WARNING: CPU: 0 PID: 2 at kernel/locking/lockdep.c:4406 check_flags.part.0+0x101/0x220
    Modules linked in:
    CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.5.0-rc1+ #11
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
    RIP: 0010:check_flags.part.0+0x101/0x220

    Call Trace:
    lock_is_held_type+0x69/0x150
    freezer_fork+0x20b/0x370
    cgroup_post_fork+0x2c9/0x5c0
    copy_process+0x2675/0x3b40
    _do_fork+0xbe/0xa30
    ? _raw_spin_unlock_irqrestore+0x40/0x50
    ? match_held_lock+0x56/0x250
    ? kthread_park+0xf0/0xf0
    kernel_thread+0xa6/0xd0
    ? kthread_park+0xf0/0xf0
    kthreadd+0x321/0x3d0
    ? kthread_create_on_cpu+0x130/0x130
    ret_from_fork+0x3a/0x50
    irq event stamp: 64
    hardirqs last enabled at (63): [] _raw_spin_unlock_irqrestore+0x40/0x50
    hardirqs last disabled at (64): [] kcsan_setup_watchpoint+0x92/0x460
    softirqs last enabled at (32): [] fpu__copy+0xe8/0x470
    softirqs last disabled at (30): [] fpu__copy+0x69/0x470

    Reported-by: Qian Cai
    Signed-off-by: Marco Elver
    Acked-by: Alexander Potapenko
    Tested-by: Qian Cai
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar

    Marco Elver
     
  • This commit adds access-type information to KCSAN's reports as follows:
    "read", "read (marked)", "write", and "write (marked)".

    Suggested-by: Paul E. McKenney
    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar

    Marco Elver
     
  • Prefer __always_inline for fast-path functions that are called outside
    of user_access_save, to avoid generating UACCESS warnings when
    optimizing for size (CC_OPTIMIZE_FOR_SIZE). It will also avoid future
    surprises with compiler versions that change the inlining heuristic even
    when optimizing for performance.

    Reported-by: Randy Dunlap
    Acked-by: Randy Dunlap # build-tested
    Signed-off-by: Marco Elver
    Signed-off-by: Paul E. McKenney
    Signed-off-by: Ingo Molnar
    Link: http://lkml.kernel.org/r/58708908-84a0-0a81-a836-ad97e33dbb62@infradead.org

    Marco Elver
     

20 Nov, 2019

1 commit

  • Tidy up a few bits:

    - Fix typos and grammar, improve wording.

    - Remove spurious newlines that are col80 warning artifacts where the
    resulting line-break is worse than the disease it's curing.

    - Use core kernel coding style to improve readability and reduce
    spurious code pattern variations.

    - Use better vertical alignment for structure definitions and initialization
    sequences.

    - Misc other small details.

    No change in functionality intended.

    Cc: linux-kernel@vger.kernel.org
    Cc: Marco Elver
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Thomas Gleixner
    Cc: Paul E. McKenney
    Cc: Will Deacon
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

16 Nov, 2019

1 commit

  • Kernel Concurrency Sanitizer (KCSAN) is a dynamic data-race detector for
    kernel space. KCSAN is a sampling watchpoint-based data-race detector.
    See the included Documentation/dev-tools/kcsan.rst for more details.

    This patch adds basic infrastructure, but does not yet enable KCSAN for
    any architecture.

    Signed-off-by: Marco Elver
    Acked-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Marco Elver