10 Sep, 2018

1 commit

  • commit d1c392c9e2a301f38998a353f467f76414e38725 upstream.

    I hit the following splat in my tests:

    ------------[ cut here ]------------
    IRQs not enabled as expected
    WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c
    Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6
    CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2
    Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
    EIP: tick_nohz_idle_enter+0x44/0x8c
    Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00
    75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff 0b 58 fa bb a0
    e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b
    EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007
    ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38
    DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296
    CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0
    Call Trace:
    do_idle+0x33/0x202
    cpu_startup_entry+0x61/0x63
    start_secondary+0x18e/0x1ed
    startup_32_smp+0x164/0x168
    irq event stamp: 18773830
    hardirqs last enabled at (18773829): [] trace_hardirqs_on_thunk+0xc/0x10
    hardirqs last disabled at (18773830): [] trace_hardirqs_off_thunk+0xc/0x10
    softirqs last enabled at (18773824): [] __do_softirq+0x25f/0x2bf
    softirqs last disabled at (18773767): [] call_on_stack+0x45/0x4b
    ---[ end trace b7c64aa79e17954a ]---

    After a bit of debugging, I found what was happening. This would trigger
    when performing "perf" with a high NMI interrupt rate, while enabling and
    disabling function tracer. Ftrace uses breakpoints to convert the nops at
    the start of functions to calls to the function trampolines. The breakpoint
    traps disable interrupts and this makes calls into lockdep via the
    trace_hardirqs_off_thunk in the entry.S code. What happens is the following:

    do_idle {

    [interrupts enabled]

    [interrupts disabled]
    TRACE_IRQS_OFF [lockdep says irqs off]
    [...]
    TRACE_IRQS_IRET
    test if pt_regs say return to interrupts enabled [yes]
    TRACE_IRQS_ON [lockdep says irqs are on]


    nmi_enter() {
    printk_nmi_enter() [traced by ftrace]
    [ hit ftrace breakpoint ]

    TRACE_IRQS_OFF [lockdep says irqs off]
    [...]
    TRACE_IRQS_IRET [return from breakpoint]
    test if pt_regs say interrupts enabled [no]
    [iret back to interrupt]
    [iret back to code]

    tick_nohz_idle_enter() {

    lockdep_assert_irqs_enabled() [lockdep say no!]

    Although interrupts are indeed enabled, lockdep thinks it is not, and since
    we now do asserts via lockdep, it gives a false warning. The issue here is
    that printk_nmi_enter() is called before lockdep_off(), which disables
    lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
    printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
    confused.

    Cc: stable@vger.kernel.org
    Fixes: 42a0bb3f71383 ("printk/nmi: generic solution for safe printk in NMI")
    Acked-by: Sergey Senozhatsky
    Acked-by: Petr Mladek
    Signed-off-by: Steven Rostedt (VMware)
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt (VMware)
     

05 Sep, 2018

1 commit

  • commit 03fc7f9c99c1e7ae2925d459e8487f1a6f199f79 upstream.

    The commit 719f6a7040f1bdaf96 ("printk: Use the main logbuf in NMI
    when logbuf_lock is available") brought back the possible deadlocks
    in printk() and NMI.

    The check of logbuf_lock is done only in printk_nmi_enter() to prevent
    mixed output. But another CPU might take the lock later, enter NMI, and:

    + Both NMIs might be serialized by yet another lock, for example,
    the one in nmi_cpu_backtrace().

    + The other CPU might get stopped in NMI, see smp_send_stop()
    in panic().

    The only safe solution is to use trylock when storing the message
    into the main log-buffer. It might cause reordering when some lines
    go to the main lock buffer directly and others are delayed via
    the per-CPU buffer. It means that it is not useful in general.

    This patch replaces the problematic NMI deferred context with NMI
    direct context. It can be used to mark a code that might produce
    many messages in NMI and the risk of losing them is more critical
    than problems with eventual reordering.

    The context is then used when dumping trace buffers on oops. It was
    the primary motivation for the original fix. Also the reordering is
    even smaller issue there because some traces have their own time stamps.

    Finally, nmi_cpu_backtrace() need not longer be serialized because
    it will always us the per-CPU buffers again.

    Fixes: 719f6a7040f1bdaf96 ("printk: Use the main logbuf in NMI when logbuf_lock is available")
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20180627142028.11259-1-pmladek@suse.com
    To: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Tetsuo Handa
    Cc: Sergey Senozhatsky
    Cc: linux-kernel@vger.kernel.org
    Cc: stable@vger.kernel.org
    Acked-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek
    Signed-off-by: Greg Kroah-Hartman

    Petr Mladek
     

03 Aug, 2018

1 commit

  • [ Upstream commit 554755be08fba31c74f66b82a485e5513205af84 ]

    Drop the in_nmi() check from printk_safe_flush_on_panic()
    and attempt to re-init (IOW unlock) locked logbuf spinlock
    from panic CPU regardless of its context.

    Otherwise, theoretically, we can deadlock on logbuf trying to flush
    per-CPU buffers:

    a) Panic CPU is running in non-NMI context
    b) Panic CPU sends out shutdown IPI via reboot vector
    c) Panic CPU fails to stop all remote CPUs
    d) Panic CPU sends out shutdown IPI via NMI vector
    One of the CPUs that we bring down via NMI vector can hold
    logbuf spin lock (theoretically).

    Link: http://lkml.kernel.org/r/20180530070350.10131-1-sergey.senozhatsky@gmail.com
    To: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Sergey Senozhatsky
     

03 Jul, 2018

1 commit

  • commit 988a35f8da1dec5a8cd2788054d1e717be61bf25 upstream.

    I noticed that there is a possibility that printk_safe_log_store() causes
    kernel oops because "args" parameter is passed to vsnprintf() again when
    atomic_cmpxchg() detected that we raced. Fix this by using va_copy().

    Link: http://lkml.kernel.org/r/201805112002.GIF21216.OFVHFOMLJtQFSO@I-love.SAKURA.ne.jp
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: dvyukov@google.com
    Cc: syzkaller@googlegroups.com
    Cc: fengguang.wu@intel.com
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Tetsuo Handa
    Fixes: 42a0bb3f71383b45 ("printk/nmi: generic solution for safe printk in NMI")
    Cc: 4.7+ # v4.7+
    Reviewed-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek
    Signed-off-by: Greg Kroah-Hartman

    Tetsuo Handa
     

13 Jun, 2017

1 commit

  • When compiling with -Wsuggest-attribute=format, gcc complains that some
    functions in kernel/printk/printk_safe.c transmit their argument to
    printf-like functions without having a printf attribute. Silence these
    warnings by adding relevant __printf attributes.

    Link: http://lkml.kernel.org/r/20170524054950.6722-1-nicolas.iooss_linux@m4x.org
    Cc: Steven Rostedt
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Nicolas Iooss
    Reviewed-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek

    Nicolas Iooss
     

19 May, 2017

1 commit

  • The commit 42a0bb3f71383b457a7d ("printk/nmi: generic solution for safe
    printk in NMI") caused that printk stores messages into a temporary
    buffer in NMI context.

    The buffer is per-CPU and therefore the size is rather limited.
    It works quite well for NMI backtraces. But there are longer logs
    that might get printed in NMI context, for example, lockdep
    warnings, ftrace_dump_on_oops.

    The temporary buffer is used to avoid deadlocks caused by
    logbuf_lock. Also it is needed to avoid races with the other
    temporary buffer that is used when PRINTK_SAFE_CONTEXT is entered.
    But the main buffer can be used in NMI if the lock is available
    and we did not interrupt PRINTK_SAFE_CONTEXT.

    The lock is checked using raw_spin_is_locked(). It might cause
    false negatives when the lock is taken on another CPU and
    this CPU is in the safe context from other reasons. Note that
    the safe context is used also to get console semaphore or when
    calling console drivers. For this reason, we do the check in
    printk_nmi_enter(). It makes the handling consistent for
    the entire NMI handler and avoids reshuffling of the messages.

    The patch also defines special printk context that allows
    to use printk_deferred() in NMI. Note that we could not flush
    the messages to the consoles because console drivers might use
    many other internal locks.

    The newly created vprintk_deferred() disables the preemption
    only around the irq work handling. It is needed there to keep
    the consistency between the two per-CPU variables. But there
    is no reason to disable preemption around vprintk_emit().

    Finally, the patch puts back explicit serialization of the NMI
    backtraces from different CPUs. It was removed by the
    commit a9edc88093287183ac934b ("x86/nmi: Perform a safe
    NMI stack trace on all CPUs"). It was not needed because
    the flushing of the temporary per-CPU buffers was serialized.

    Link: http://lkml.kernel.org/r/1493912763-24873-1-git-send-email-pmladek@suse.com
    Cc: Steven Rostedt
    Cc: Andrew Morton
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Daniel Thompson
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Chris Metcalf
    Cc: x86@kernel.org
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-kernel@vger.kernel.org
    Suggested-by: Sergey Senozhatsky
    Acked-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek

    Petr Mladek
     

08 Feb, 2017

4 commits

  • Account lost messages in pritk-safe and printk-safe-nmi
    contexts and report those numbers during printk_safe_flush().

    The patch also moves lost message counter to struct
    `printk_safe_seq_buf' instead of having dedicated static
    counters - this simplifies the code.

    Link: http://lkml.kernel.org/r/20161227141611.940-6-sergey.senozhatsky@gmail.com
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Steven Rostedt
    Cc: Jan Kara
    Cc: Tejun Heo
    Cc: Calvin Owens
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Andy Lutomirski
    Cc: Peter Hurley
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek

    Sergey Senozhatsky
     
  • Always use printk_deferred() in printk_safe_flush_line().
    Flushing can be done from NMI or printk_safe contexts (when
    we are in panic), so we can't call console drivers, yet still
    want to store the messages in the logbuf buffer. Therefore we
    use a deferred printk version.

    Link: http://lkml.kernel.org/r/20170206164253.GA463@tigerII.localdomain
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Jan Kara
    Cc: Tejun Heo
    Cc: Calvin Owens
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Andy Lutomirski
    Cc: Peter Hurley
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Sergey Senozhatsky
    Suggested-by: Petr Mladek
    Signed-off-by: Petr Mladek
    Reviewed-by: Steven Rostedt (VMware)

    Sergey Senozhatsky
     
  • This patch extends the idea of NMI per-cpu buffers to regions
    that may cause recursive printk() calls and possible deadlocks.
    Namely, printk() can't handle printk calls from schedule code
    or printk() calls from lock debugging code (spin_dump() for instance);
    because those may be called with `sem->lock' already taken or any
    other `critical' locks (p->pi_lock, etc.). An example of deadlock
    can be

    vprintk_emit()
    console_unlock()
    up() << raw_spin_lock_irqsave(&sem->lock, flags);
    wake_up_process()
    try_to_wake_up()
    ttwu_queue()
    ttwu_activate()
    activate_task()
    enqueue_task()
    enqueue_task_fair()
    cfs_rq_of()
    task_of()
    WARN_ON_ONCE(!entity_is_task(se))
    vprintk_emit()
    console_trylock()
    down_trylock()
    raw_spin_lock_irqsave(&sem->lock, flags)
    ^^^^ deadlock

    and some other cases.

    Just like in NMI implementation, the solution uses a per-cpu
    `printk_func' pointer to 'redirect' printk() calls to a 'safe'
    callback, that store messages in a per-cpu buffer and flushes
    them back to logbuf buffer later.

    Usage example:

    printk()
    printk_safe_enter_irqsave(flags)
    //
    // any printk() call from here will endup in vprintk_safe(),
    // that stores messages in a special per-CPU buffer.
    //
    printk_safe_exit_irqrestore(flags)

    The 'redirection' mechanism, though, has been reworked, as suggested
    by Petr Mladek. Instead of using a per-cpu @print_func callback we now
    keep a per-cpu printk-context variable and call either default or nmi
    vprintk function depending on its value. printk_nmi_entrer/exit and
    printk_safe_enter/exit, thus, just set/celar corresponding bits in
    printk-context functions.

    The patch only adds printk_safe support, we don't use it yet.

    Link: http://lkml.kernel.org/r/20161227141611.940-4-sergey.senozhatsky@gmail.com
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Jan Kara
    Cc: Tejun Heo
    Cc: Calvin Owens
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Andy Lutomirski
    Cc: Peter Hurley
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek
    Reviewed-by: Steven Rostedt (VMware)

    Sergey Senozhatsky
     
  • A preparation patch for printk_safe work. No functional change.
    - rename nmi.c to print_safe.c
    - add `printk_safe' prefix to some (which used both by printk-safe
    and printk-nmi) of the exported functions.

    Link: http://lkml.kernel.org/r/20161227141611.940-3-sergey.senozhatsky@gmail.com
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Jan Kara
    Cc: Tejun Heo
    Cc: Calvin Owens
    Cc: Steven Rostedt
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Andy Lutomirski
    Cc: Peter Hurley
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek

    Sergey Senozhatsky