05 Nov, 2020

1 commit

  • When an exception/interrupt hits kernel space and the kernel is not
    currently in the idle task then RCU must be watching.

    irqentry_enter() validates this via rcu_irq_enter_check_tick(), which in
    turn invokes lockdep when taking a lock. But at that point lockdep does not
    yet know about the fact that interrupts have been disabled by the CPU,
    which triggers a lockdep splat complaining about inconsistent state.

    Invoking trace_hardirqs_off() before rcu_irq_enter_check_tick() defeats the
    point of rcu_irq_enter_check_tick() because trace_hardirqs_off() uses RCU.

    So use the same sequence as for the idle case and tell lockdep about the
    irq state change first, invoke the RCU check and then do the lockdep and
    tracer update.

    Fixes: a5497bab5f72 ("entry: Provide generic interrupt entry/exit code")
    Reported-by: Mark Rutland
    Signed-off-by: Thomas Gleixner
    Tested-by: Mark Rutland
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/87y2jhl19s.fsf@nanos.tec.linutronix.de

    Thomas Gleixner
     

24 Oct, 2020

1 commit

  • Pull arch task_work cleanups from Jens Axboe:
    "Two cleanups that don't fit other categories:

    - Finally get the task_work_add() cleanup done properly, so we don't
    have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
    all callers, and also fixes up the documentation for
    task_work_add().

    - While working on some TIF related changes for 5.11, this
    TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
    duplication for how that is handled"

    * tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
    task_work: cleanup notification modes
    tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()

    Linus Torvalds
     

19 Oct, 2020

1 commit

  • Pull RCU changes from Ingo Molnar:

    - Debugging for smp_call_function()

    - RT raw/non-raw lock ordering fixes

    - Strict grace periods for KASAN

    - New smp_call_function() torture test

    - Torture-test updates

    - Documentation updates

    - Miscellaneous fixes

    [ This doesn't actually pull the tag - I've dropped the last merge from
    the RCU branch due to questions about the series. - Linus ]

    * tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
    smp: Make symbol 'csd_bug_count' static
    kernel/smp: Provide CSD lock timeout diagnostics
    smp: Add source and destination CPUs to __call_single_data
    rcu: Shrink each possible cpu krcp
    rcu/segcblist: Prevent useless GP start if no CBs to accelerate
    torture: Add gdb support
    rcutorture: Allow pointer leaks to test diagnostic code
    rcutorture: Hoist OOM registry up one level
    refperf: Avoid null pointer dereference when buf fails to allocate
    rcutorture: Properly synchronize with OOM notifier
    rcutorture: Properly set rcu_fwds for OOM handling
    torture: Add kvm.sh --help and update help message
    rcutorture: Add CONFIG_PROVE_RCU_LIST to TREE05
    torture: Update initrd documentation
    rcutorture: Replace HTTP links with HTTPS ones
    locktorture: Make function torture_percpu_rwsem_init() static
    torture: document --allcpus argument added to the kvm.sh script
    rcutorture: Output number of elapsed grace periods
    rcutorture: Remove KCSAN stubs
    rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp()
    ...

    Linus Torvalds
     

18 Oct, 2020

1 commit


13 Oct, 2020

1 commit


09 Oct, 2020

1 commit

  • …k/linux-rcu into core/rcu

    Pull v5.10 RCU changes from Paul E. McKenney:

    - Debugging for smp_call_function().

    - Strict grace periods for KASAN. The point of this series is to find
    RCU-usage bugs, so the corresponding new RCU_STRICT_GRACE_PERIOD
    Kconfig option depends on both DEBUG_KERNEL and RCU_EXPERT, and is
    further disabled by dfefault. Finally, the help text includes
    a goodly list of scary caveats.

    - New smp_call_function() torture test.

    - Torture-test updates.

    - Documentation updates.

    - Miscellaneous fixes.

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

23 Sep, 2020

1 commit


15 Sep, 2020

1 commit

  • On v5.8 when doing seccomp syscall rewrites (e.g. getpid into getppid
    as seen in the seccomp selftests), trace (and audit) correctly see the
    rewritten syscall on entry and exit:

    seccomp_bpf-1307 [000] .... 22974.874393: sys_enter: NR 110 (...
    seccomp_bpf-1307 [000] .N.. 22974.874401: sys_exit: NR 110 = 1304

    With mainline we see a mismatched enter and exit (the original syscall
    is incorrectly visible on entry):

    seccomp_bpf-1030 [000] .... 21.806766: sys_enter: NR 39 (...
    seccomp_bpf-1030 [000] .... 21.806767: sys_exit: NR 110 = 1027

    When ptrace or seccomp change the syscall, this needs to be visible to
    trace and audit at that time as well. Update the syscall earlier so they
    see the correct value.

    Fixes: d88d59b64ca3 ("core/entry: Respect syscall number rewrites")
    Reported-by: Michael Ellerman
    Signed-off-by: Kees Cook
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/20200912005826.586171-1-keescook@chromium.org

    Kees Cook
     

04 Sep, 2020

1 commit

  • Andy reported that the syscall treacing for 32bit fast syscall fails:

    # ./tools/testing/selftests/x86/ptrace_syscall_32
    ...
    [RUN] SYSEMU
    [FAIL] Initial args are wrong (nr=224, args=10 11 12 13 14 4289172732)
    ...
    [RUN] SYSCALL
    [FAIL] Initial args are wrong (nr=29, args=0 0 0 0 0 4289172732)

    The eason is that the conversion to generic entry code moved the retrieval
    of the sixth argument (EBP) after the point where the syscall entry work
    runs, i.e. ptrace, seccomp, audit...

    Unbreak it by providing a split up version of syscall_enter_from_user_mode().

    - syscall_enter_from_user_mode_prepare() establishes state and enables
    interrupts

    - syscall_enter_from_user_mode_work() runs the entry work

    Replace the call to syscall_enter_from_user_mode() in the 32bit fast
    syscall C-entry with the split functions and stick the EBP retrieval
    between them.

    Fixes: 27d6b4d14f5c ("x86/entry: Use generic syscall entry function")
    Reported-by: Andy Lutomirski
    Signed-off-by: Thomas Gleixner
    Link: https://lore.kernel.org/r/87k0xdjbtt.fsf@nanos.tec.linutronix.de

    Thomas Gleixner
     

25 Aug, 2020

1 commit


21 Aug, 2020

1 commit

  • The transcript of the x86 entry code to the generic version failed to
    reload the syscall number from ptregs after ptrace and seccomp have run,
    which both can modify the syscall number in ptregs. It returns the original
    syscall number instead which is obviously not the right thing to do.

    Reload the syscall number to fix that.

    Fixes: 142781e108b1 ("entry: Provide generic syscall entry functionality")
    Reported-by: Kyle Huey
    Signed-off-by: Thomas Gleixner
    Tested-by: Kyle Huey
    Tested-by: Kees Cook
    Acked-by: Kees Cook
    Link: https://lore.kernel.org/r/87blj6ifo8.fsf@nanos.tec.linutronix.de

    Thomas Gleixner
     

26 Jul, 2020

1 commit

  • The noinstr attribute is to be specified before the return type in the
    same way 'inline' is used.

    Similar cases were recently fixed for x86 in commit 7f6fa101dfac ("x86:
    Correct noinstr qualifiers"), but the generic entry code was based on the
    the original version and did not carry the fix over.

    Fixes: a5497bab5f72 ("entry: Provide generic interrupt entry/exit code")
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20200725091951.744848-3-mingo@kernel.org

    Ingo Molnar
     

24 Jul, 2020

4 commits

  • Entering a guest is similar to exiting to user space. Pending work like
    handling signals, rescheduling, task work etc. needs to be handled before
    that.

    Provide generic infrastructure to avoid duplication of the same handling
    code all over the place.

    The transfer to guest mode handling is different from the exit to usermode
    handling, e.g. vs. rseq and live patching, so a separate function is used.

    The initial list of work items handled is:

    TIF_SIGPENDING, TIF_NEED_RESCHED, TIF_NOTIFY_RESUME

    Architecture specific TIF flags can be added via defines in the
    architecture specific include files.

    The calling convention is also different from the syscall/interrupt entry
    functions as KVM invokes this from the outer vcpu_run() loop with
    interrupts and preemption enabled. To prevent missing a pending work item
    it invokes a check for pending TIF work from interrupt disabled code right
    before transitioning to guest mode. The lockdep, RCU and tracing state
    handling is also done directly around the switch to and from guest mode.

    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20200722220519.833296398@linutronix.de

    Thomas Gleixner
     
  • Like the syscall entry/exit code interrupt/exception entry after the real
    low level ASM bits should not be different accross architectures.

    Provide a generic version based on the x86 code.

    irqentry_enter() is called after the low level entry code and
    irqentry_exit() must be invoked right before returning to the low level
    code which just contains the actual return logic. The code before
    irqentry_enter() and irqentry_exit() must not be instrumented. Code after
    irqentry_enter() and before irqentry_exit() can be instrumented.

    irqentry_enter() invokes irqentry_enter_from_user_mode() if the
    interrupt/exception came from user mode. If if entered from kernel mode it
    handles the kernel mode variant of establishing state for lockdep, RCU and
    tracing depending on the kernel context it interrupted (idle, non-idle).

    Signed-off-by: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20200722220519.723703209@linutronix.de

    Thomas Gleixner
     
  • Like syscall entry all architectures have similar and pointlessly different
    code to handle pending work before returning from a syscall to user space.

    1) One-time syscall exit work:
    - rseq syscall exit
    - audit
    - syscall tracing
    - tracehook (single stepping)

    2) Preparatory work
    - Exit to user mode loop (common TIF handling).
    - Architecture specific one time work arch_exit_to_user_mode_prepare()
    - Address limit and lockdep checks

    3) Final transition (lockdep, tracing, context tracking, RCU). Invokes
    arch_exit_to_user_mode() to handle e.g. speculation mitigations

    Provide a generic version based on the x86 code which has all the RCU and
    instrumentation protections right.

    Provide a variant for interrupt return to user mode as well which shares
    the above #2 and #3 work items.

    After syscall_exit_to_user_mode() and irqentry_exit_to_user_mode() the
    architecture code just has to return to user space. The code after
    returning from these functions must not be instrumented.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kees Cook
    Link: https://lkml.kernel.org/r/20200722220519.613977173@linutronix.de

    Thomas Gleixner
     
  • On syscall entry certain work needs to be done:

    - Establish state (lockdep, context tracking, tracing)
    - Conditional work (ptrace, seccomp, audit...)

    This code is needlessly duplicated and different in all
    architectures.

    Provide a generic version based on the x86 implementation which has all the
    RCU and instrumentation bits right.

    As interrupt/exception entry from user space needs parts of the same
    functionality, provide a function for this as well.

    syscall_enter_from_user_mode() and irqentry_enter_from_user_mode() must be
    called right after the low level ASM entry. The calling code must be
    non-instrumentable. After the functions returns state is correct and the
    subsequent functions can be instrumented.

    Signed-off-by: Thomas Gleixner
    Acked-by: Kees Cook
    Link: https://lkml.kernel.org/r/20200722220519.513463269@linutronix.de

    Thomas Gleixner