29 Dec, 2018
1 commit
-
commit c7c3f05e341a9a2bd1a92993d4f996cfd6e7348e upstream.
From printk()/serial console point of view panic() is special, because
it may force CPU to re-enter printk() or/and serial console driver.
Therefore, some of serial consoles drivers are re-entrant. E.g. 8250:serial8250_console_write()
{
if (port->sysrq)
locked = 0;
else if (oops_in_progress)
locked = spin_trylock_irqsave(&port->lock, flags);
else
spin_lock_irqsave(&port->lock, flags);
...
}panic() does set oops_in_progress via bust_spinlocks(1), so in theory
we should be able to re-enter serial console driver from panic():CPU0
uart_console_write()
serial8250_console_write() // if (oops_in_progress)
// spin_trylock_irqsave()
call_console_drivers()
console_unlock()
console_flush_on_panic()
bust_spinlocks(1) // oops_in_progress++
panic()
spin_lock_irqsave(&port->lock, flags) // spin_lock_irqsave()
serial8250_console_write()
call_console_drivers()
console_unlock()
printk()
...However, this does not happen and we deadlock in serial console on
port->lock spinlock. And the problem is that console_flush_on_panic()
called after bust_spinlocks(0):void panic(const char *fmt, ...)
{
bust_spinlocks(1);
...
bust_spinlocks(0);
console_flush_on_panic();
...
}bust_spinlocks(0) decrements oops_in_progress, so oops_in_progress
can go back to zero. Thus even re-entrant console drivers will simply
spin on port->lock spinlock. Given that port->lock may already be
locked either by a stopped CPU, or by the very same CPU we execute
panic() on (for instance, NMI panic() on printing CPU) the system
deadlocks and does not reboot.Fix this by removing bust_spinlocks(0), so oops_in_progress is always
set in panic() now and, thus, re-entrant console drivers will trylock
the port->lock instead of spinning on it forever, when we call them
from console_flush_on_panic().Link: http://lkml.kernel.org/r/20181025101036.6823-1-sergey.senozhatsky@gmail.com
Cc: Steven Rostedt
Cc: Daniel Wang
Cc: Peter Zijlstra
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Greg Kroah-Hartman
Cc: Alan Cox
Cc: Jiri Slaby
Cc: Peter Feiner
Cc: linux-serial@vger.kernel.org
Cc: Sergey Senozhatsky
Cc: stable@vger.kernel.org
Signed-off-by: Sergey Senozhatsky
Signed-off-by: Petr Mladek
Signed-off-by: Greg Kroah-Hartman
17 Aug, 2017
1 commit
-
This implements refcount_t overflow protection on x86 without a noticeable
performance impact, though without the fuller checking of REFCOUNT_FULL.This is done by duplicating the existing atomic_t refcount implementation
but with normally a single instruction added to detect if the refcount
has gone negative (e.g. wrapped past INT_MAX or below zero). When detected,
the handler saturates the refcount_t to INT_MIN / 2. With this overflow
protection, the erroneous reference release that would follow a wrap back
to zero is blocked from happening, avoiding the class of refcount-overflow
use-after-free vulnerabilities entirely.Only the overflow case of refcounting can be perfectly protected, since
it can be detected and stopped before the reference is freed and left to
be abused by an attacker. There isn't a way to block early decrements,
and while REFCOUNT_FULL stops increment-from-zero cases (which would
be the state _after_ an early decrement and stops potential double-free
conditions), this fast implementation does not, since it would require
the more expensive cmpxchg loops. Since the overflow case is much more
common (e.g. missing a "put" during an error path), this protection
provides real-world protection. For example, the two public refcount
overflow use-after-free exploits published in 2016 would have been
rendered unexploitable:http://perception-point.io/2016/01/14/analysis-and-exploitation-of-a-linux-kernel-vulnerability-cve-2016-0728/
http://cyseclabs.com/page?n=02012016
This implementation does, however, notice an unchecked decrement to zero
(i.e. caller used refcount_dec() instead of refcount_dec_and_test() and it
resulted in a zero). Decrements under zero are noticed (since they will
have resulted in a negative value), though this only indicates that a
use-after-free may have already happened. Such notifications are likely
avoidable by an attacker that has already exploited a use-after-free
vulnerability, but it's better to have them reported than allow such
conditions to remain universally silent.On first overflow detection, the refcount value is reset to INT_MIN / 2
(which serves as a saturation value) and a report and stack trace are
produced. When operations detect only negative value results (such as
changing an already saturated value), saturation still happens but no
notification is performed (since the value was already saturated).On the matter of races, since the entire range beyond INT_MAX but before
0 is negative, every operation at INT_MIN / 2 will trap, leaving no
overflow-only race condition.As for performance, this implementation adds a single "js" instruction
to the regular execution flow of a copy of the standard atomic_t refcount
operations. (The non-"and_test" refcount_dec() function, which is uncommon
in regular refcount design patterns, has an additional "jz" instruction
to detect reaching exactly zero.) Since this is a forward jump, it is by
default the non-predicted path, which will be reinforced by dynamic branch
prediction. The result is this protection having virtually no measurable
change in performance over standard atomic_t operations. The error path,
located in .text.unlikely, saves the refcount location and then uses UD0
to fire a refcount exception handler, which resets the refcount, handles
reporting, and returns to regular execution. This keeps the changes to
.text size minimal, avoiding return jumps and open-coded calls to the
error reporting routine.Example assembly comparison:
refcount_inc() before:
.text:
ffffffff81546149: f0 ff 45 f4 lock incl -0xc(%rbp)refcount_inc() after:
.text:
ffffffff81546149: f0 ff 45 f4 lock incl -0xc(%rbp)
ffffffff8154614d: 0f 88 80 d5 17 00 js ffffffff816c36d3
...
.text.unlikely:
ffffffff816c36d3: 48 8d 4d f4 lea -0xc(%rbp),%rcx
ffffffff816c36d7: 0f ff (bad)These are the cycle counts comparing a loop of refcount_inc() from 1
to INT_MAX and back down to 0 (via refcount_dec_and_test()), between
unprotected refcount_t (atomic_t), fully protected REFCOUNT_FULL
(refcount_t-full), and this overflow-protected refcount (refcount_t-fast):2147483646 refcount_inc()s and 2147483647 refcount_dec_and_test()s:
cycles protections
atomic_t 82249267387 none
refcount_t-fast 82211446892 overflow, untested dec-to-zero
refcount_t-full 144814735193 overflow, untested dec-to-zero, inc-from-zeroThis code is a modified version of the x86 PAX_REFCOUNT atomic_t
overflow defense from the last public patch of PaX/grsecurity, based
on my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code. Thanks
to PaX Team for various suggestions for improvement for repurposing this
code to be a refcount-only protection.Signed-off-by: Kees Cook
Reviewed-by: Josh Poimboeuf
Cc: Alexey Dobriyan
Cc: Andrew Morton
Cc: Arnd Bergmann
Cc: Christoph Hellwig
Cc: David S. Miller
Cc: Davidlohr Bueso
Cc: Elena Reshetova
Cc: Eric Biggers
Cc: Eric W. Biederman
Cc: Greg KH
Cc: Hans Liljestrand
Cc: James Bottomley
Cc: Jann Horn
Cc: Linus Torvalds
Cc: Manfred Spraul
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Serge E. Hallyn
Cc: Thomas Gleixner
Cc: arozansk@redhat.com
Cc: axboe@kernel.dk
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch
Link: http://lkml.kernel.org/r/20170815161924.GA133115@beast
Signed-off-by: Ingo Molnar
02 Mar, 2017
1 commit
-
We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar
24 Feb, 2017
1 commit
-
Now we can also jump to boot prom from sunhv console by sending
break twice on console for both running and panicked kernel
cases.Signed-off-by: Vijay Kumar
Signed-off-by: David S. Miller
23 Feb, 2017
1 commit
-
Pull printk updates from Petr Mladek:
- Add Petr Mladek, Sergey Senozhatsky as printk maintainers, and Steven
Rostedt as the printk reviewer. This idea came up after the
discussion about printk issues at Kernel Summit. It was formulated
and discussed at lkml[1].- Extend a lock-less NMI per-cpu buffers idea to handle recursive
printk() calls by Sergey Senozhatsky[2]. It is the first step in
sanitizing printk as discussed at Kernel Summit.The change allows to see messages that would normally get ignored or
would cause a deadlock.Also it allows to enable lockdep in printk(). This already paid off.
The testing in linux-next helped to discover two old problems that
were hidden before[3][4].- Remove unused parameter by Sergey Senozhatsky. Clean up after a past
change.[1] http://lkml.kernel.org/r/1481798878-31898-1-git-send-email-pmladek@suse.com
[2] http://lkml.kernel.org/r/20161227141611.940-1-sergey.senozhatsky@gmail.com
[3] http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
[4] http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
printk: drop call_console_drivers() unused param
printk: convert the rest to printk-safe
printk: remove zap_locks() function
printk: use printk_safe buffers in printk
printk: report lost messages in printk safe/nmi contexts
printk: always use deferred printk when flush printk_safe lines
printk: introduce per-cpu safe_print seq buffer
printk: rename nmi.c and exported api
printk: use vprintk_func in vprintk()
MAINTAINERS: Add printk maintainers
08 Feb, 2017
1 commit
-
A preparation patch for printk_safe work. No functional change.
- rename nmi.c to print_safe.c
- add `printk_safe' prefix to some (which used both by printk-safe
and printk-nmi) of the exported functions.Link: http://lkml.kernel.org/r/20161227141611.940-3-sergey.senozhatsky@gmail.com
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Jan Kara
Cc: Tejun Heo
Cc: Calvin Owens
Cc: Steven Rostedt
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Andy Lutomirski
Cc: Peter Hurley
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sergey Senozhatsky
Signed-off-by: Petr Mladek
25 Jan, 2017
1 commit
-
When a system panics, the "Rebooting in X seconds.." message is never
printed because it lacks a new line. Fix it.Link: http://lkml.kernel.org/r/20170119114751.2724-1-jslaby@suse.cz
Signed-off-by: Jiri Slaby
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Jan, 2017
1 commit
-
Commit 7fd8329ba502 ("taint/module: Clean up global and module taint
flags handling") used the key words true and false as character members
of a new struct. These names cause problems when out-of-kernel modules
such as VirtualBox include their own definitions of true and false.Fixes: 7fd8329ba502 ("taint/module: Clean up global and module taint flags handling")
Signed-off-by: Larry Finger
Cc: Petr Mladek
Cc: Jessica Yu
Cc: Rusty Russell
Reported-by: Valdis Kletnieks
Reviewed-by: Petr Mladek
Acked-by: Rusty Russell
Signed-off-by: Jessica Yu
27 Nov, 2016
1 commit
-
The commit 66cc69e34e86a231 ("Fix: module signature vs tracepoints:
add new TAINT_UNSIGNED_MODULE") updated module_taint_flags() to
potentially print one more character. But it did not increase the
size of the corresponding buffers in m_show() and print_modules().We have recently done the same mistake when adding a taint flag
for livepatching, see
https://lkml.kernel.org/r/cfba2c823bb984690b73572aaae1db596b54a082.1472137475.git.jpoimboe@redhat.comAlso struct module uses an incompatible type for mod-taints flags.
It survived from the commit 2bc2d61a9638dab670d ("[PATCH] list module
taint flags in Oops/panic"). There was used "int" for the global taint
flags at these times. But only the global tain flags was later changed
to "unsigned long" by the commit 25ddbb18aae33ad2 ("Make the taint
flags reliable").This patch defines TAINT_FLAGS_COUNT that can be used to create
arrays and buffers of the right size. Note that we could not use
enum because the taint flag indexes are used also in assembly code.Then it reworks the table that describes the taint flags. The TAINT_*
numbers can be used as the index. Instead, we add information
if the taint flag is also shown per-module.Finally, it uses "unsigned long", bit operations, and the updated
taint_flags table also for mod->taints.It is not optimal because only few taint flags can be printed by
module_taint_flags(). But better be on the safe side. IMHO, it is
not worth the optimization and this is a good compromise.Signed-off-by: Petr Mladek
Link: http://lkml.kernel.org/r/1474458442-21581-1-git-send-email-pmladek@suse.com
[jeyu@redhat.com: fix broken lkml link in changelog]
Signed-off-by: Jessica Yu
12 Oct, 2016
1 commit
-
Daniel Walker reported problems which happens when
crash_kexec_post_notifiers kernel option is enabled
(https://lkml.org/lkml/2015/6/24/44).In that case, smp_send_stop() is called before entering kdump routines
which assume other CPUs are still online. As the result, for x86, kdump
routines fail to save other CPUs' registers and disable virtualization
extensions.To fix this problem, call a new kdump friendly function,
crash_smp_send_stop(), instead of the smp_send_stop() when
crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a weak
function, and it just call smp_send_stop(). Architecture codes should
override it so that kdump can work appropriately. This patch only
provides x86-specific version.For Xen's PV kernel, just keep the current behavior.
NOTES:
- Right solution would be to place crash_smp_send_stop() before
__crash_kexec() invocation in all cases and remove smp_send_stop(), but
we can't do that until all architectures implement own
crash_smp_send_stop()- crash_smp_send_stop()-like work is still needed by
machine_crash_shutdown() because crash_kexec() can be called without
entering panic()Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
Link: http://lkml.kernel.org/r/20160810080948.11028.15344.stgit@sysi4-13.yrl.intra.hitachi.co.jp
Signed-off-by: Hidehiro Kawai
Reported-by: Daniel Walker
Cc: Dave Young
Cc: Baoquan He
Cc: Vivek Goyal
Cc: Eric Biederman
Cc: Masami Hiramatsu
Cc: Daniel Walker
Cc: Xunlei Pang
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Borislav Petkov
Cc: David Vrabel
Cc: Toshi Kani
Cc: Ralf Baechle
Cc: David Daney
Cc: Aaro Koskinen
Cc: "Steven J. Hill"
Cc: Corey Minyard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Aug, 2016
1 commit
-
crash_kexec_post_notifiers ia a boot option which controls whether the
1st kernel calls panic notifiers or not before booting the 2nd kernel.
However, there is no need to limit it to being modifiable only at boot
time. So, use core_param instead of early_param.Link: http://lkml.kernel.org/r/20160705113327.5864.43139.stgit@softrs
Signed-off-by: Hidehiro Kawai
Cc: Dave Young
Cc: Baoquan He
Cc: Vivek Goyal
Cc: Eric Biederman
Cc: Masami Hiramatsu
Cc: Borislav Petkov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
21 May, 2016
1 commit
-
In NMI context, printk() messages are stored into per-CPU buffers to
avoid a possible deadlock. They are normally flushed to the main ring
buffer via an IRQ work. But the work is never called when the system
calls panic() in the very same NMI handler.This patch tries to flush NMI buffers before the crash dump is
generated. In this case it does not risk a double release and bails out
when the logbuf_lock is already taken. The aim is to get the messages
into the main ring buffer when possible. It makes them better
accessible in the vmcore.Then the patch tries to flush the buffers second time when other CPUs
are down. It might be more aggressive and reset logbuf_lock. The aim
is to get the messages available for the consequent kmsg_dump() and
console_flush_on_panic() calls.The patch causes vprintk_emit() to be called even in NMI context again.
But it is done via printk_deferred() so that the console handling is
skipped. Consoles use internal locks and we could not prevent a
deadlock easily. They are explicitly called later when the crash dump
is not generated, see console_flush_on_panic().Signed-off-by: Petr Mladek
Cc: Benjamin Herrenschmidt
Cc: Daniel Thompson
Cc: David Miller
Cc: Ingo Molnar
Cc: Jan Kara
Cc: Jiri Kosina
Cc: Martin Schwidefsky
Cc: Peter Zijlstra
Cc: Ralf Baechle
Cc: Russell King
Cc: Steven Rostedt
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Mar, 2016
1 commit
-
Commit 1717f2096b54 ("panic, x86: Fix re-entrance problem due to panic
on NMI") and commit 58c5661f2144 ("panic, x86: Allow CPUs to save
registers even if looping in NMI context") introduced nmi_panic() which
prevents concurrent/recursive execution of panic(). It also saves
registers for the crash dump on x86.However, there are some cases where NMI handlers still use panic().
This patch set partially replaces them with nmi_panic() in those cases.Even this patchset is applied, some NMI or similar handlers (e.g. MCE
handler) continue to use panic(). This is because I can't test them
well and actual problems won't happen. For example, the possibility
that normal panic and panic on MCE happen simultaneously is very low.This patch (of 3):
Convert nmi_panic() to a proper function and export it instead of
exporting internal implementation details to modules, for obvious
reasons.Signed-off-by: Hidehiro Kawai
Acked-by: Borislav Petkov
Acked-by: Michal Nazarewicz
Cc: Michal Hocko
Cc: Rasmus Villemoes
Cc: Nicolas Iooss
Cc: Javi Merino
Cc: Gobinda Charan Maji
Cc: "Steven Rostedt (Red Hat)"
Cc: Thomas Gleixner
Cc: Vitaly Kuznetsov
Cc: HATAYAMA Daisuke
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Mar, 2016
1 commit
-
The traceoff_on_warning option doesn't have any effect on s390, powerpc,
arm64, parisc, and sh because there are two different types of WARN
implementations:1) The above mentioned architectures treat WARN() as a special case of a
BUG() exception. They handle warnings in report_bug() in lib/bug.c.2) All other architectures just call warn_slowpath_*() directly. Their
warnings are handled in warn_slowpath_common() in kernel/panic.c.Support traceoff_on_warning on all architectures and prevent any future
divergence by using a single common function to emit the warning.Also remove the '()' from '%pS()', because the parentheses look funky:
[ 45.607629] WARNING: at /root/warn_mod/warn_mod.c:17 .init_dummy+0x20/0x40 [warn_mod]()
Reported-by: Chunyu Hu
Signed-off-by: Josh Poimboeuf
Acked-by: Heiko Carstens
Tested-by: Prarit Bhargava
Acked-by: Prarit Bhargava
Acked-by: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Jan, 2016
1 commit
-
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.Signed-off-by: Tejun Heo
Reported-by: Calvin Owens
Acked-by: Jan Kara
Cc: Dave Jones
Cc: Kyle McMartin
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
19 Dec, 2015
3 commits
-
Currently, panic() and crash_kexec() can be called at the same time.
For example (x86 case):CPU 0:
oops_end()
crash_kexec()
mutex_trylock() // acquired
nmi_shootdown_cpus() // stop other CPUsCPU 1:
panic()
crash_kexec()
mutex_trylock() // failed to acquire
smp_send_stop() // stop other CPUs
infinite loopIf CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump
fails.In another case:
CPU 0:
oops_end()
crash_kexec()
mutex_trylock() // acquired
io_check_error()
panic()
crash_kexec()
mutex_trylock() // failed to acquire
infinite loopClearly, this is an undesirable result.
To fix this problem, this patch changes crash_kexec() to exclude others
by using the panic_cpu atomic.Signed-off-by: Hidehiro Kawai
Acked-by: Michal Hocko
Cc: Andrew Morton
Cc: Baoquan He
Cc: Dave Young
Cc: "Eric W. Biederman"
Cc: HATAYAMA Daisuke
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Jonathan Corbet
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: Martin Schwidefsky
Cc: Masami Hiramatsu
Cc: Minfei Huang
Cc: Peter Zijlstra
Cc: Seth Jennings
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Vitaly Kuznetsov
Cc: Vivek Goyal
Cc: x86-ml
Link: http://lkml.kernel.org/r/20151210014630.25437.94161.stgit@softrs
Signed-off-by: Borislav Petkov
Signed-off-by: Thomas Gleixner -
Currently, kdump_nmi_shootdown_cpus(), a subroutine of crash_kexec(),
sends an NMI IPI to CPUs which haven't called panic() to stop them,
save their register information and do some cleanups for crash dumping.
However, if such a CPU is infinitely looping in NMI context, we fail to
save its register information into the crash dump.For example, this can happen when unknown NMIs are broadcast to all
CPUs as follows:CPU 0 CPU 1
=========================== ==========================
receive an unknown NMI
unknown_nmi_error()
panic() receive an unknown NMI
spin_trylock(&panic_lock) unknown_nmi_error()
crash_kexec() panic()
spin_trylock(&panic_lock)
panic_smp_self_stop()
infinite loop
kdump_nmi_shootdown_cpus()
issue NMI IPI -----------> blocked until IRET
infinite loop...Here, since CPU 1 is in NMI context, the second NMI from CPU 0 is
blocked until CPU 1 executes IRET. However, CPU 1 never executes IRET,
so the NMI is not handled and the callback function to save registers is
never called.In practice, this can happen on some servers which broadcast NMIs to all
CPUs when the NMI button is pushed.To save registers in this case, we need to:
a) Return from NMI handler instead of looping infinitely
or
b) Call the callback function directly from the infinite loopInherently, a) is risky because NMI is also used to prevent corrupted
data from being propagated to devices. So, we chose b).This patch does the following:
1. Move the infinite looping of CPUs which haven't called panic() in NMI
context (actually done by panic_smp_self_stop()) outside of panic() to
enable us to refer pt_regs. Please note that panic_smp_self_stop() is
still used for normal context.2. Call a callback of kdump_nmi_shootdown_cpus() directly to save
registers and do some cleanups after setting waiting_for_crash_ipi which
is used for counting down the number of CPUs which handled the callbackSigned-off-by: Hidehiro Kawai
Acked-by: Michal Hocko
Cc: Aaron Tomlin
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Baoquan He
Cc: Chris Metcalf
Cc: Dave Young
Cc: David Hildenbrand
Cc: Don Zickus
Cc: Eric Biederman
Cc: Frederic Weisbecker
Cc: Gobinda Charan Maji
Cc: HATAYAMA Daisuke
Cc: Hidehiro Kawai
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Javi Merino
Cc: Jiang Liu
Cc: Jonathan Corbet
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: lkml
Cc: Masami Hiramatsu
Cc: Michal Nazarewicz
Cc: Nicolas Iooss
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Prarit Bhargava
Cc: Rasmus Villemoes
Cc: Seth Jennings
Cc: Stefan Lippers-Hollmann
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Ulrich Obergfell
Cc: Vitaly Kuznetsov
Cc: Vivek Goyal
Cc: Yasuaki Ishimatsu
Link: http://lkml.kernel.org/r/20151210014628.25437.75256.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov
Signed-off-by: Thomas Gleixner -
If panic on NMI happens just after panic() on the same CPU, panic() is
recursively called. Kernel stalls, as a result, after failing to acquire
panic_lock.To avoid this problem, don't call panic() in NMI context if we've
already entered panic().For that, introduce nmi_panic() macro to reduce code duplication. In
the case of panic on NMI, don't return from NMI handlers if another CPU
already panicked.Signed-off-by: Hidehiro Kawai
Acked-by: Michal Hocko
Cc: Aaron Tomlin
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Baoquan He
Cc: Chris Metcalf
Cc: David Hildenbrand
Cc: Don Zickus
Cc: "Eric W. Biederman"
Cc: Frederic Weisbecker
Cc: Gobinda Charan Maji
Cc: HATAYAMA Daisuke
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Javi Merino
Cc: Jonathan Corbet
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: lkml
Cc: Masami Hiramatsu
Cc: Michal Nazarewicz
Cc: Nicolas Iooss
Cc: Peter Zijlstra
Cc: Prarit Bhargava
Cc: Rasmus Villemoes
Cc: Rusty Russell
Cc: Seth Jennings
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Ulrich Obergfell
Cc: Vitaly Kuznetsov
Cc: Vivek Goyal
Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov
Signed-off-by: Thomas Gleixner
21 Nov, 2015
1 commit
-
Commit 08d78658f393 ("panic: release stale console lock to always get the
logbuf printed out") introduced an unwanted bad unlock balance report when
panic() is called directly and not from OOPS (e.g. from out_of_memory()).
The difference is that in case of OOPS we disable locks debug in
oops_enter() and on direct panic call nobody does that.Fixes: 08d78658f393 ("panic: release stale console lock to always get the logbuf printed out")
Reported-by: kernel test robot
Signed-off-by: Vitaly Kuznetsov
Cc: HATAYAMA Daisuke
Cc: Masami Hiramatsu
Cc: Jiri Kosina
Cc: Baoquan He
Cc: Prarit Bhargava
Cc: Xie XiuQi
Cc: Seth Jennings
Cc: "K. Y. Srinivasan"
Cc: Jan Kara
Cc: Petr Mladek
Cc: Yasuaki Ishimatsu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Nov, 2015
1 commit
-
In some cases we may end up killing the CPU holding the console lock
while still having valuable data in logbuf. E.g. I'm observing the
following:- A crash is happening on one CPU and console_unlock() is being called on
some other.- console_unlock() tries to print out the buffer before releasing the lock
and on slow console it takes time.- in the meanwhile crashing CPU does lots of printk()-s with valuable data
(which go to the logbuf) and sends IPIs to all other CPUs.- console_unlock() finishes printing previous chunk and enables interrupts
before trying to print out the rest, the CPU catches the IPI and never
releases console lock.This is not the only possible case: in VT/fb subsystems we have many other
console_lock()/console_unlock() users. Non-masked interrupts (or
receiving NMI in case of extreme slowness) will have the same result.
Getting the whole console buffer printed out on crash should be top
priority.[akpm@linux-foundation.org: tweak comment text]
Signed-off-by: Vitaly Kuznetsov
Cc: HATAYAMA Daisuke
Cc: Masami Hiramatsu
Cc: Jiri Kosina
Cc: Baoquan He
Cc: Prarit Bhargava
Cc: Xie XiuQi
Cc: Seth Jennings
Cc: "K. Y. Srinivasan"
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Jul, 2015
2 commits
-
Commit f06e5153f4ae2e ("kernel/panic.c: add "crash_kexec_post_notifiers"
option for kdump after panic_notifers") introduced
"crash_kexec_post_notifiers" kernel boot option, which toggles wheather
panic() calls crash_kexec() before panic_notifiers and dump kmsg or after.The problem is that the commit overlooks panic_on_oops kernel boot option.
If it is enabled, crash_kexec() is called directly without going through
panic() in oops path.To fix this issue, this patch adds a check to "crash_kexec_post_notifiers"
in the condition of kexec_should_crash().Also, put a comment in kexec_should_crash() to explain not obvious things
on this patch.Signed-off-by: HATAYAMA Daisuke
Acked-by: Baoquan He
Tested-by: Hidehiro Kawai
Reviewed-by: Masami Hiramatsu
Cc: Vivek Goyal
Cc: Ingo Molnar
Cc: Hidehiro Kawai
Cc: Baoquan He
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
For compatibility with the behaviour before the commit f06e5153f4ae2e
("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after
panic_notifers"), the 2nd crash_kexec() should be called only if
crash_kexec_post_notifiers is enabled.Note that crash_kexec() returns immediately if kdump crash kernel is not
loaded, so in this case, this patch makes no functionality change, but the
point is to make it explicit, from the caller panic() side, that the 2nd
crash_kexec() does nothing.Signed-off-by: HATAYAMA Daisuke
Suggested-by: Ingo Molnar
Cc: "Eric W. Biederman"
Cc: Vivek Goyal
Cc: Masami Hiramatsu
Cc: Hidehiro Kawai
Cc: Baoquan He
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Dec, 2014
1 commit
-
This adds a new taint flag to indicate when the kernel or a kernel
module has been live patched. This will provide a clean indication in
bug reports that live patching was used.Additionally, if the crash occurs in a live patched function, the live
patch module will appear beside the patched function in the backtrace.Signed-off-by: Seth Jennings
Acked-by: Josh Poimboeuf
Reviewed-by: Miroslav Benes
Reviewed-by: Petr Mladek
Reviewed-by: Masami Hiramatsu
Signed-off-by: Jiri Kosina
11 Dec, 2014
1 commit
-
There have been several times where I have had to rebuild a kernel to
cause a panic when hitting a WARN() in the code in order to get a crash
dump from a system. Sometimes this is easy to do, other times (such as
in the case of a remote admin) it is not trivial to send new images to
the user.A much easier method would be a switch to change the WARN() over to a
panic. This makes debugging easier in that I can now test the actual
image the WARN() was seen on and I do not have to engage in remote
debugging.This patch adds a panic_on_warn kernel parameter and
/proc/sys/kernel/panic_on_warn calls panic() in the
warn_slowpath_common() path. The function will still print out the
location of the warning.An example of the panic_on_warn output:
The first line below is from the WARN_ON() to output the WARN_ON()'s
location. After that the panic() output is displayed.WARNING: CPU: 30 PID: 11698 at /home/prarit/dummy_module/dummy-module.c:25 init_dummy+0x1f/0x30 [dummy_module]()
Kernel panic - not syncing: panic_on_warn set ...CPU: 30 PID: 11698 Comm: insmod Tainted: G W OE 3.17.0+ #57
Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
0000000000000000 000000008e3f87df ffff88080f093c38 ffffffff81665190
0000000000000000 ffffffff818aea3d ffff88080f093cb8 ffffffff8165e2ec
ffffffff00000008 ffff88080f093cc8 ffff88080f093c68 000000008e3f87df
Call Trace:
[] dump_stack+0x46/0x58
[] panic+0xd0/0x204
[] ? init_dummy+0x1f/0x30 [dummy_module]
[] warn_slowpath_common+0xd0/0xd0
[] ? dummy_greetings+0x40/0x40 [dummy_module]
[] warn_slowpath_null+0x1a/0x20
[] init_dummy+0x1f/0x30 [dummy_module]
[] do_one_initcall+0xd4/0x210
[] ? __vunmap+0xc2/0x110
[] load_module+0x16a9/0x1b30
[] ? store_uevent+0x70/0x70
[] ? copy_module_from_fd.isra.44+0x129/0x180
[] SyS_finit_module+0xa6/0xd0
[] system_call_fastpath+0x12/0x17Successfully tested by me.
hpa said: There is another very valid use for this: many operators would
rather a machine shuts down than being potentially compromised either
functionally or security-wise.Signed-off-by: Prarit Bhargava
Cc: Jonathan Corbet
Cc: Rusty Russell
Cc: "H. Peter Anvin"
Cc: Andi Kleen
Cc: Masami Hiramatsu
Acked-by: Yasuaki Ishimatsu
Cc: Fabian Frederick
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Nov, 2014
1 commit
-
Commit 69361eef9056 ("panic: add TAINT_SOFTLOCKUP") added the 'L' flag,
but failed to update the comments for print_tainted(). So, update the
comments.Signed-off-by: Xie XiuQi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Aug, 2014
1 commit
-
This taint flag will be set if the system has ever entered a softlockup
state. Similar to TAINT_WARN it is useful to know whether or not the
system has been in a softlockup state when debugging.[akpm@linux-foundation.org: apply the taint before calling panic()]
Signed-off-by: Josh Hunt
Cc: Jason Baron
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Jun, 2014
1 commit
-
Add a "crash_kexec_post_notifiers" boot option to run kdump after
running panic_notifiers and dump kmsg. This can help rare situations
where kdump fails because of unstable crashed kernel or hardware failure
(memory corruption on critical data/code), or the 2nd kernel is already
broken by the 1st kernel (it's a broken behavior, but who can guarantee
that the "crashed" kernel works correctly?).Usage: add "crash_kexec_post_notifiers" to kernel boot option.
Note that this actually increases risks of the failure of kdump. This
option should be set only if you worry about the rare case of kdump
failure rather than increasing the chance of success.Signed-off-by: Masami Hiramatsu
Acked-by: Motohiro Kosaki
Acked-by: Vivek Goyal
Cc: Eric Biederman
Cc: Yoshihiro YUNOMAE
Cc: Satoru MORIYA
Cc: Tomoki Sekiyama
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Apr, 2014
1 commit
-
Currently, booting without initrd specified on 80x25 screen gives a call
trace followed by atkbd : Spurious ACK. Original message ("VFS: Unable
to mount root fs") is not available. Of course this could happen in
other situations...This patch displays panic reason after call trace which could help lot
of people even if it's not the very last line on screen.Also, convert all panic.c printk(KERN_EMERG to pr_emerg(
[akpm@linux-foundation.org: missed a couple of pr_ conversions]
Signed-off-by: Fabian Frederick
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Apr, 2014
1 commit
-
Pull module updates from Rusty Russell:
"Nothing major: the stricter permissions checking for sysfs broke a
staging driver; fix included. Greg KH said he'd take the patch but
hadn't as the merge window opened, so it's included here to avoid
breaking build"* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
staging: fix up speakup kobject mode
Use 'E' instead of 'X' for unsigned module taint flag.
VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.
kallsyms: fix percpu vars on x86-64 with relocation.
kallsyms: generalize address range checking
module: LLVMLinux: Remove unused function warning from __param_check macro
Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE
module: remove MODULE_GENERIC_TABLE
module: allow multiple calls to MODULE_DEVICE_TABLE() per module
module: use pr_cont
01 Apr, 2014
1 commit
-
Pull x86 LTO changes from Peter Anvin:
"More infrastructure work in preparation for link-time optimization
(LTO). Most of these changes is to make sure symbols accessed from
assembly code are properly marked as visible so the linker doesn't
remove them.My understanding is that the changes to support LTO are still not
upstream in binutils, but are on the way there. This patchset should
conclude the x86-specific changes, and remaining patches to actually
enable LTO will be fed through the Kbuild tree (other than keeping up
with changes to the x86 code base, of course), although not
necessarily in this merge window"* 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
Kbuild, lto: Handle basic LTO in modpost
Kbuild, lto: Disable LTO for asm-offsets.c
Kbuild, lto: Add a gcc-ld script to let run gcc as ld
Kbuild, lto: add ld-version and ld-ifversion macros
Kbuild, lto: Drop .number postfixes in modpost
Kbuild, lto, workaround: Don't warn for initcall_reference in modpost
lto: Disable LTO for sys_ni
lto: Handle LTO common symbols in module loader
lto, workaround: Add workaround for initcall reordering
lto: Make asmlinkage __visible
x86, lto: Disable LTO for the x86 VDSO
initconst, x86: Fix initconst mistake in ts5500 code
initconst: Fix initconst mistake in dcdbas
asmlinkage: Make trace_hardirqs_on/off_caller visible
asmlinkage, x86: Fix 32bit memcpy for LTO
asmlinkage Make __stack_chk_failed and memcmp visible
asmlinkage: Mark rwsem functions that can be called from assembler asmlinkage
asmlinkage: Make main_extable_sort_needed visible
asmlinkage, mutex: Mark __visible
asmlinkage: Make trace_hardirq visible
...
31 Mar, 2014
1 commit
-
Takashi Iwai says:
> The letter 'X' has been already used for SUSE kernels for very long
> time, to indicate the external supported modules. Can the new flag be
> changed to another letter for avoiding conflict...?
> (BTW, we also use 'N' for "no support", too.)Note: this code should be cleaned up, so we don't have such maps in
three places!Signed-off-by: Rusty Russell
21 Mar, 2014
1 commit
-
Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC, so we can repurpose
the flag to encompass a wider range of pushing the CPU beyond its
warrany.Signed-off-by: Dave Jones
Link: http://lkml.kernel.org/r/20140226154949.GA770@redhat.com
Signed-off-by: H. Peter Anvin
13 Mar, 2014
1 commit
-
Users have reported being unable to trace non-signed modules loaded
within a kernel supporting module signature.This is caused by tracepoint.c:tracepoint_module_coming() refusing to
take into account tracepoints sitting within force-loaded modules
(TAINT_FORCED_MODULE). The reason for this check, in the first place, is
that a force-loaded module may have a struct module incompatible with
the layout expected by the kernel, and can thus cause a kernel crash
upon forced load of that module on a kernel with CONFIG_TRACEPOINTS=y.Tracepoints, however, specifically accept TAINT_OOT_MODULE and
TAINT_CRAP, since those modules do not lead to the "very likely system
crash" issue cited above for force-loaded modules.With kernels having CONFIG_MODULE_SIG=y (signed modules), a non-signed
module is tainted re-using the TAINT_FORCED_MODULE taint flag.
Unfortunately, this means that Tracepoints treat that module as a
force-loaded module, and thus silently refuse to consider any tracepoint
within this module.Since an unsigned module does not fit within the "very likely system
crash" category of tainting, add a new TAINT_UNSIGNED_MODULE taint flag
to specifically address this taint behavior, and accept those modules
within Tracepoints. We use the letter 'X' as a taint flag character for
a module being loaded that doesn't know how to sign its name (proposed
by Steven Rostedt).Also add the missing 'O' entry to trace event show_module_flags() list
for the sake of completeness.Signed-off-by: Mathieu Desnoyers
Acked-by: Steven Rostedt
NAKed-by: Ingo Molnar
CC: Thomas Gleixner
CC: David Howells
CC: Greg Kroah-Hartman
Signed-off-by: Rusty Russell
14 Feb, 2014
1 commit
-
In LTO symbols implicitely referenced by the compiler need
to be visible. Earlier these symbols were visible implicitely
from being exported, but we disabled implicit visibility fo
EXPORTs when modules are disabled to improve code size. So
now these symbols have to be marked visible explicitely.Do this for __stack_chk_fail (with stack protector)
and memcmp.Signed-off-by: Andi Kleen
Link: http://lkml.kernel.org/r/1391845930-28580-10-git-send-email-ak@linux.intel.com
Signed-off-by: H. Peter Anvin
26 Nov, 2013
1 commit
-
The panic_timeout value can be set via the command line option
'panic=x', or via /proc/sys/kernel/panic, however that is not
sufficient when the panic occurs before we are able to set up
these values. Thus, add a CONFIG_PANIC_TIMEOUT so that we can
set the desired value from the .config.The default panic_timeout value continues to be 0 - wait
forever. Also adds set_arch_panic_timeout(new_timeout,
arch_default_timeout), which is intended to be used by arches in
arch_setup(). The idea being that the new_timeout is only set if
the user hasn't changed from the arch_default_timeout.Signed-off-by: Jason Baron
Cc: benh@kernel.crashing.org
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: mpe@ellerman.id.au
Cc: felipe.contreras@gmail.com
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1a1674daec27c534df409697025ac568ebcee91e.1385418410.git.jbaron@akamai.com
Signed-off-by: Ingo Molnar
13 Nov, 2013
1 commit
-
sizeof("Tainted: ") already counts '\0', and after first sprintf(), 's'
will start from the current string end (its' value is '\0').So need not add additional 1 byte for maximized usage of 'buf' in
print_tainted().Signed-off-by: Chen Gang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Sep, 2013
1 commit
-
Since the panic handlers may produce additional information (via printk)
for the kernel log, it should be reported as part of the panic output
saved by kmsg_dump(). Without this re-ordering, nothing that adds
information to a panic will show up in pstore's view when kmsg_dump runs,
and is therefore not visible to crash reporting tools that examine pstore
output.Signed-off-by: Kees Cook
Cc: Anton Vorontsov
Cc: Colin Cross
Acked-by: Tony Luck
Cc: Stephen Boyd
Cc: Vikram Mulukutla
Cc: Peter Zijlstra
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Jul, 2013
1 commit
-
Pull tracing changes from Steven Rostedt:
"The majority of the changes here are cleanups for the large changes
that were added to 3.10, which includes several bug fixes that have
been marked for stable.As for new features, there were a few, but nothing to write to LWN
about. These include:New function trigger called "dump" and "cpudump" that will cause
ftrace to dump its buffer to the console when the function is called.
The difference between "dump" and "cpudump" is that "dump" will dump
the entire contents of the ftrace buffer, where as "cpudump" will only
dump the contents of the ftrace buffer for the CPU that called the
function.Another small enhancement is a new sysctl switch called
"traceoff_on_warning" which, when enabled, will disable tracing if any
WARN_ON() is triggered. This is useful if you want to debug what
caused a warning and do not want to risk losing your trace data by the
ring buffer overwriting the data before you can disable it. There's
also a kernel command line option that will make this enabled at boot
up called the same thing"* tag 'trace-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (34 commits)
tracing: Make tracing_open_generic_{tr,tc}() static
tracing: Remove ftrace() function
tracing: Remove TRACE_EVENT_TYPE enum definition
tracing: Make tracer_tracing_{off,on,is_on}() static
tracing: Fix irqs-off tag display in syscall tracing
uprobes: Fix return value in error handling path
tracing: Fix race between deleting buffer and setting events
tracing: Add trace_array_get/put() to event handling
tracing: Get trace_array ref counts when accessing trace files
tracing: Add trace_array_get/put() to handle instance refs better
tracing: Protect ftrace_trace_arrays list in trace_events.c
tracing: Make trace_marker use the correct per-instance buffer
ftrace: Do not run selftest if command line parameter is set
tracing/kprobes: Don't pass addr=ip to perf_trace_buf_submit()
tracing: Use flag buffer_disabled for irqsoff tracer
tracing/kprobes: Turn trace_probe->files into list_head
tracing: Fix disabling of soft disable
tracing: Add missing syscall_metadata comment
tracing: Simplify code for showing of soft disabled flag
tracing/kprobes: Kill probe_enable_lock
...
10 Jul, 2013
1 commit
-
Add the cpu/pid that called WARN() so that the stack traces can be
matched up with the WARNING messages.[akpm@linux-foundation.org: remove stray quote]
Signed-off-by: Alex Thorlton
Reviewed-by: Robin Holt
Cc: Stephen Boyd
Cc: Vikram Mulukutla
Cc: Rusty Russell
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Jun, 2013
1 commit
-
Add a traceoff_on_warning option in both the kernel command line as well
as a sysctl option. When set, any WARN*() function that is hit will cause
the tracing_on variable to be cleared, which disables writing to the
ring buffer.This is useful especially when tracing a bug with function tracing. When
a warning is hit, the print caused by the warning can flood the trace with
the functions that producing the output for the warning. This can make the
resulting trace useless by either hiding where the bug happened, or worse,
by overflowing the buffer and losing the trace of the bug totally.Acked-by: Peter Zijlstra
Signed-off-by: Steven Rostedt