02 Mar, 2017
1 commit
-
We are going to split out of , which
will have to be picked up from other headers and a couple of .c files.Create a trivial placeholder file that just
maps to to make this patch obviously correct and
bisectable.Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar
24 Feb, 2017
1 commit
-
Now we can also jump to boot prom from sunhv console by sending
break twice on console for both running and panicked kernel
cases.Signed-off-by: Vijay Kumar
Signed-off-by: David S. Miller
23 Feb, 2017
1 commit
-
Pull printk updates from Petr Mladek:
- Add Petr Mladek, Sergey Senozhatsky as printk maintainers, and Steven
Rostedt as the printk reviewer. This idea came up after the
discussion about printk issues at Kernel Summit. It was formulated
and discussed at lkml[1].- Extend a lock-less NMI per-cpu buffers idea to handle recursive
printk() calls by Sergey Senozhatsky[2]. It is the first step in
sanitizing printk as discussed at Kernel Summit.The change allows to see messages that would normally get ignored or
would cause a deadlock.Also it allows to enable lockdep in printk(). This already paid off.
The testing in linux-next helped to discover two old problems that
were hidden before[3][4].- Remove unused parameter by Sergey Senozhatsky. Clean up after a past
change.[1] http://lkml.kernel.org/r/1481798878-31898-1-git-send-email-pmladek@suse.com
[2] http://lkml.kernel.org/r/20161227141611.940-1-sergey.senozhatsky@gmail.com
[3] http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
[4] http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
printk: drop call_console_drivers() unused param
printk: convert the rest to printk-safe
printk: remove zap_locks() function
printk: use printk_safe buffers in printk
printk: report lost messages in printk safe/nmi contexts
printk: always use deferred printk when flush printk_safe lines
printk: introduce per-cpu safe_print seq buffer
printk: rename nmi.c and exported api
printk: use vprintk_func in vprintk()
MAINTAINERS: Add printk maintainers
08 Feb, 2017
1 commit
-
A preparation patch for printk_safe work. No functional change.
- rename nmi.c to print_safe.c
- add `printk_safe' prefix to some (which used both by printk-safe
and printk-nmi) of the exported functions.Link: http://lkml.kernel.org/r/20161227141611.940-3-sergey.senozhatsky@gmail.com
Cc: Andrew Morton
Cc: Linus Torvalds
Cc: Jan Kara
Cc: Tejun Heo
Cc: Calvin Owens
Cc: Steven Rostedt
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Andy Lutomirski
Cc: Peter Hurley
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Sergey Senozhatsky
Signed-off-by: Petr Mladek
25 Jan, 2017
1 commit
-
When a system panics, the "Rebooting in X seconds.." message is never
printed because it lacks a new line. Fix it.Link: http://lkml.kernel.org/r/20170119114751.2724-1-jslaby@suse.cz
Signed-off-by: Jiri Slaby
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Jan, 2017
1 commit
-
Commit 7fd8329ba502 ("taint/module: Clean up global and module taint
flags handling") used the key words true and false as character members
of a new struct. These names cause problems when out-of-kernel modules
such as VirtualBox include their own definitions of true and false.Fixes: 7fd8329ba502 ("taint/module: Clean up global and module taint flags handling")
Signed-off-by: Larry Finger
Cc: Petr Mladek
Cc: Jessica Yu
Cc: Rusty Russell
Reported-by: Valdis Kletnieks
Reviewed-by: Petr Mladek
Acked-by: Rusty Russell
Signed-off-by: Jessica Yu
27 Nov, 2016
1 commit
-
The commit 66cc69e34e86a231 ("Fix: module signature vs tracepoints:
add new TAINT_UNSIGNED_MODULE") updated module_taint_flags() to
potentially print one more character. But it did not increase the
size of the corresponding buffers in m_show() and print_modules().We have recently done the same mistake when adding a taint flag
for livepatching, see
https://lkml.kernel.org/r/cfba2c823bb984690b73572aaae1db596b54a082.1472137475.git.jpoimboe@redhat.comAlso struct module uses an incompatible type for mod-taints flags.
It survived from the commit 2bc2d61a9638dab670d ("[PATCH] list module
taint flags in Oops/panic"). There was used "int" for the global taint
flags at these times. But only the global tain flags was later changed
to "unsigned long" by the commit 25ddbb18aae33ad2 ("Make the taint
flags reliable").This patch defines TAINT_FLAGS_COUNT that can be used to create
arrays and buffers of the right size. Note that we could not use
enum because the taint flag indexes are used also in assembly code.Then it reworks the table that describes the taint flags. The TAINT_*
numbers can be used as the index. Instead, we add information
if the taint flag is also shown per-module.Finally, it uses "unsigned long", bit operations, and the updated
taint_flags table also for mod->taints.It is not optimal because only few taint flags can be printed by
module_taint_flags(). But better be on the safe side. IMHO, it is
not worth the optimization and this is a good compromise.Signed-off-by: Petr Mladek
Link: http://lkml.kernel.org/r/1474458442-21581-1-git-send-email-pmladek@suse.com
[jeyu@redhat.com: fix broken lkml link in changelog]
Signed-off-by: Jessica Yu
12 Oct, 2016
1 commit
-
Daniel Walker reported problems which happens when
crash_kexec_post_notifiers kernel option is enabled
(https://lkml.org/lkml/2015/6/24/44).In that case, smp_send_stop() is called before entering kdump routines
which assume other CPUs are still online. As the result, for x86, kdump
routines fail to save other CPUs' registers and disable virtualization
extensions.To fix this problem, call a new kdump friendly function,
crash_smp_send_stop(), instead of the smp_send_stop() when
crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a weak
function, and it just call smp_send_stop(). Architecture codes should
override it so that kdump can work appropriately. This patch only
provides x86-specific version.For Xen's PV kernel, just keep the current behavior.
NOTES:
- Right solution would be to place crash_smp_send_stop() before
__crash_kexec() invocation in all cases and remove smp_send_stop(), but
we can't do that until all architectures implement own
crash_smp_send_stop()- crash_smp_send_stop()-like work is still needed by
machine_crash_shutdown() because crash_kexec() can be called without
entering panic()Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
Link: http://lkml.kernel.org/r/20160810080948.11028.15344.stgit@sysi4-13.yrl.intra.hitachi.co.jp
Signed-off-by: Hidehiro Kawai
Reported-by: Daniel Walker
Cc: Dave Young
Cc: Baoquan He
Cc: Vivek Goyal
Cc: Eric Biederman
Cc: Masami Hiramatsu
Cc: Daniel Walker
Cc: Xunlei Pang
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Borislav Petkov
Cc: David Vrabel
Cc: Toshi Kani
Cc: Ralf Baechle
Cc: David Daney
Cc: Aaro Koskinen
Cc: "Steven J. Hill"
Cc: Corey Minyard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Aug, 2016
1 commit
-
crash_kexec_post_notifiers ia a boot option which controls whether the
1st kernel calls panic notifiers or not before booting the 2nd kernel.
However, there is no need to limit it to being modifiable only at boot
time. So, use core_param instead of early_param.Link: http://lkml.kernel.org/r/20160705113327.5864.43139.stgit@softrs
Signed-off-by: Hidehiro Kawai
Cc: Dave Young
Cc: Baoquan He
Cc: Vivek Goyal
Cc: Eric Biederman
Cc: Masami Hiramatsu
Cc: Borislav Petkov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
21 May, 2016
1 commit
-
In NMI context, printk() messages are stored into per-CPU buffers to
avoid a possible deadlock. They are normally flushed to the main ring
buffer via an IRQ work. But the work is never called when the system
calls panic() in the very same NMI handler.This patch tries to flush NMI buffers before the crash dump is
generated. In this case it does not risk a double release and bails out
when the logbuf_lock is already taken. The aim is to get the messages
into the main ring buffer when possible. It makes them better
accessible in the vmcore.Then the patch tries to flush the buffers second time when other CPUs
are down. It might be more aggressive and reset logbuf_lock. The aim
is to get the messages available for the consequent kmsg_dump() and
console_flush_on_panic() calls.The patch causes vprintk_emit() to be called even in NMI context again.
But it is done via printk_deferred() so that the console handling is
skipped. Consoles use internal locks and we could not prevent a
deadlock easily. They are explicitly called later when the crash dump
is not generated, see console_flush_on_panic().Signed-off-by: Petr Mladek
Cc: Benjamin Herrenschmidt
Cc: Daniel Thompson
Cc: David Miller
Cc: Ingo Molnar
Cc: Jan Kara
Cc: Jiri Kosina
Cc: Martin Schwidefsky
Cc: Peter Zijlstra
Cc: Ralf Baechle
Cc: Russell King
Cc: Steven Rostedt
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Mar, 2016
1 commit
-
Commit 1717f2096b54 ("panic, x86: Fix re-entrance problem due to panic
on NMI") and commit 58c5661f2144 ("panic, x86: Allow CPUs to save
registers even if looping in NMI context") introduced nmi_panic() which
prevents concurrent/recursive execution of panic(). It also saves
registers for the crash dump on x86.However, there are some cases where NMI handlers still use panic().
This patch set partially replaces them with nmi_panic() in those cases.Even this patchset is applied, some NMI or similar handlers (e.g. MCE
handler) continue to use panic(). This is because I can't test them
well and actual problems won't happen. For example, the possibility
that normal panic and panic on MCE happen simultaneously is very low.This patch (of 3):
Convert nmi_panic() to a proper function and export it instead of
exporting internal implementation details to modules, for obvious
reasons.Signed-off-by: Hidehiro Kawai
Acked-by: Borislav Petkov
Acked-by: Michal Nazarewicz
Cc: Michal Hocko
Cc: Rasmus Villemoes
Cc: Nicolas Iooss
Cc: Javi Merino
Cc: Gobinda Charan Maji
Cc: "Steven Rostedt (Red Hat)"
Cc: Thomas Gleixner
Cc: Vitaly Kuznetsov
Cc: HATAYAMA Daisuke
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Mar, 2016
1 commit
-
The traceoff_on_warning option doesn't have any effect on s390, powerpc,
arm64, parisc, and sh because there are two different types of WARN
implementations:1) The above mentioned architectures treat WARN() as a special case of a
BUG() exception. They handle warnings in report_bug() in lib/bug.c.2) All other architectures just call warn_slowpath_*() directly. Their
warnings are handled in warn_slowpath_common() in kernel/panic.c.Support traceoff_on_warning on all architectures and prevent any future
divergence by using a single common function to emit the warning.Also remove the '()' from '%pS()', because the parentheses look funky:
[ 45.607629] WARNING: at /root/warn_mod/warn_mod.c:17 .init_dummy+0x20/0x40 [warn_mod]()
Reported-by: Chunyu Hu
Signed-off-by: Josh Poimboeuf
Acked-by: Heiko Carstens
Tested-by: Prarit Bhargava
Acked-by: Prarit Bhargava
Acked-by: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Jan, 2016
1 commit
-
@console_may_schedule tracks whether console_sem was acquired through
lock or trylock. If the former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched(). This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines. Also, only a few drivers call
console_conditional_schedule() to begin with. This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield for
a long time on a non-preemptible kernel.If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.Fix it by making console_unlock() insert cond_resched() between lines if
@console_may_schedule.Signed-off-by: Tejun Heo
Reported-by: Calvin Owens
Acked-by: Jan Kara
Cc: Dave Jones
Cc: Kyle McMartin
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
19 Dec, 2015
3 commits
-
Currently, panic() and crash_kexec() can be called at the same time.
For example (x86 case):CPU 0:
oops_end()
crash_kexec()
mutex_trylock() // acquired
nmi_shootdown_cpus() // stop other CPUsCPU 1:
panic()
crash_kexec()
mutex_trylock() // failed to acquire
smp_send_stop() // stop other CPUs
infinite loopIf CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump
fails.In another case:
CPU 0:
oops_end()
crash_kexec()
mutex_trylock() // acquired
io_check_error()
panic()
crash_kexec()
mutex_trylock() // failed to acquire
infinite loopClearly, this is an undesirable result.
To fix this problem, this patch changes crash_kexec() to exclude others
by using the panic_cpu atomic.Signed-off-by: Hidehiro Kawai
Acked-by: Michal Hocko
Cc: Andrew Morton
Cc: Baoquan He
Cc: Dave Young
Cc: "Eric W. Biederman"
Cc: HATAYAMA Daisuke
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Jonathan Corbet
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: Martin Schwidefsky
Cc: Masami Hiramatsu
Cc: Minfei Huang
Cc: Peter Zijlstra
Cc: Seth Jennings
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Vitaly Kuznetsov
Cc: Vivek Goyal
Cc: x86-ml
Link: http://lkml.kernel.org/r/20151210014630.25437.94161.stgit@softrs
Signed-off-by: Borislav Petkov
Signed-off-by: Thomas Gleixner -
Currently, kdump_nmi_shootdown_cpus(), a subroutine of crash_kexec(),
sends an NMI IPI to CPUs which haven't called panic() to stop them,
save their register information and do some cleanups for crash dumping.
However, if such a CPU is infinitely looping in NMI context, we fail to
save its register information into the crash dump.For example, this can happen when unknown NMIs are broadcast to all
CPUs as follows:CPU 0 CPU 1
=========================== ==========================
receive an unknown NMI
unknown_nmi_error()
panic() receive an unknown NMI
spin_trylock(&panic_lock) unknown_nmi_error()
crash_kexec() panic()
spin_trylock(&panic_lock)
panic_smp_self_stop()
infinite loop
kdump_nmi_shootdown_cpus()
issue NMI IPI -----------> blocked until IRET
infinite loop...Here, since CPU 1 is in NMI context, the second NMI from CPU 0 is
blocked until CPU 1 executes IRET. However, CPU 1 never executes IRET,
so the NMI is not handled and the callback function to save registers is
never called.In practice, this can happen on some servers which broadcast NMIs to all
CPUs when the NMI button is pushed.To save registers in this case, we need to:
a) Return from NMI handler instead of looping infinitely
or
b) Call the callback function directly from the infinite loopInherently, a) is risky because NMI is also used to prevent corrupted
data from being propagated to devices. So, we chose b).This patch does the following:
1. Move the infinite looping of CPUs which haven't called panic() in NMI
context (actually done by panic_smp_self_stop()) outside of panic() to
enable us to refer pt_regs. Please note that panic_smp_self_stop() is
still used for normal context.2. Call a callback of kdump_nmi_shootdown_cpus() directly to save
registers and do some cleanups after setting waiting_for_crash_ipi which
is used for counting down the number of CPUs which handled the callbackSigned-off-by: Hidehiro Kawai
Acked-by: Michal Hocko
Cc: Aaron Tomlin
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Baoquan He
Cc: Chris Metcalf
Cc: Dave Young
Cc: David Hildenbrand
Cc: Don Zickus
Cc: Eric Biederman
Cc: Frederic Weisbecker
Cc: Gobinda Charan Maji
Cc: HATAYAMA Daisuke
Cc: Hidehiro Kawai
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Javi Merino
Cc: Jiang Liu
Cc: Jonathan Corbet
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: lkml
Cc: Masami Hiramatsu
Cc: Michal Nazarewicz
Cc: Nicolas Iooss
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Prarit Bhargava
Cc: Rasmus Villemoes
Cc: Seth Jennings
Cc: Stefan Lippers-Hollmann
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Ulrich Obergfell
Cc: Vitaly Kuznetsov
Cc: Vivek Goyal
Cc: Yasuaki Ishimatsu
Link: http://lkml.kernel.org/r/20151210014628.25437.75256.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov
Signed-off-by: Thomas Gleixner -
If panic on NMI happens just after panic() on the same CPU, panic() is
recursively called. Kernel stalls, as a result, after failing to acquire
panic_lock.To avoid this problem, don't call panic() in NMI context if we've
already entered panic().For that, introduce nmi_panic() macro to reduce code duplication. In
the case of panic on NMI, don't return from NMI handlers if another CPU
already panicked.Signed-off-by: Hidehiro Kawai
Acked-by: Michal Hocko
Cc: Aaron Tomlin
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Baoquan He
Cc: Chris Metcalf
Cc: David Hildenbrand
Cc: Don Zickus
Cc: "Eric W. Biederman"
Cc: Frederic Weisbecker
Cc: Gobinda Charan Maji
Cc: HATAYAMA Daisuke
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Javi Merino
Cc: Jonathan Corbet
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: lkml
Cc: Masami Hiramatsu
Cc: Michal Nazarewicz
Cc: Nicolas Iooss
Cc: Peter Zijlstra
Cc: Prarit Bhargava
Cc: Rasmus Villemoes
Cc: Rusty Russell
Cc: Seth Jennings
Cc: Steven Rostedt
Cc: Thomas Gleixner
Cc: Ulrich Obergfell
Cc: Vitaly Kuznetsov
Cc: Vivek Goyal
Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov
Signed-off-by: Thomas Gleixner
21 Nov, 2015
1 commit
-
Commit 08d78658f393 ("panic: release stale console lock to always get the
logbuf printed out") introduced an unwanted bad unlock balance report when
panic() is called directly and not from OOPS (e.g. from out_of_memory()).
The difference is that in case of OOPS we disable locks debug in
oops_enter() and on direct panic call nobody does that.Fixes: 08d78658f393 ("panic: release stale console lock to always get the logbuf printed out")
Reported-by: kernel test robot
Signed-off-by: Vitaly Kuznetsov
Cc: HATAYAMA Daisuke
Cc: Masami Hiramatsu
Cc: Jiri Kosina
Cc: Baoquan He
Cc: Prarit Bhargava
Cc: Xie XiuQi
Cc: Seth Jennings
Cc: "K. Y. Srinivasan"
Cc: Jan Kara
Cc: Petr Mladek
Cc: Yasuaki Ishimatsu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Nov, 2015
1 commit
-
In some cases we may end up killing the CPU holding the console lock
while still having valuable data in logbuf. E.g. I'm observing the
following:- A crash is happening on one CPU and console_unlock() is being called on
some other.- console_unlock() tries to print out the buffer before releasing the lock
and on slow console it takes time.- in the meanwhile crashing CPU does lots of printk()-s with valuable data
(which go to the logbuf) and sends IPIs to all other CPUs.- console_unlock() finishes printing previous chunk and enables interrupts
before trying to print out the rest, the CPU catches the IPI and never
releases console lock.This is not the only possible case: in VT/fb subsystems we have many other
console_lock()/console_unlock() users. Non-masked interrupts (or
receiving NMI in case of extreme slowness) will have the same result.
Getting the whole console buffer printed out on crash should be top
priority.[akpm@linux-foundation.org: tweak comment text]
Signed-off-by: Vitaly Kuznetsov
Cc: HATAYAMA Daisuke
Cc: Masami Hiramatsu
Cc: Jiri Kosina
Cc: Baoquan He
Cc: Prarit Bhargava
Cc: Xie XiuQi
Cc: Seth Jennings
Cc: "K. Y. Srinivasan"
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Jul, 2015
2 commits
-
Commit f06e5153f4ae2e ("kernel/panic.c: add "crash_kexec_post_notifiers"
option for kdump after panic_notifers") introduced
"crash_kexec_post_notifiers" kernel boot option, which toggles wheather
panic() calls crash_kexec() before panic_notifiers and dump kmsg or after.The problem is that the commit overlooks panic_on_oops kernel boot option.
If it is enabled, crash_kexec() is called directly without going through
panic() in oops path.To fix this issue, this patch adds a check to "crash_kexec_post_notifiers"
in the condition of kexec_should_crash().Also, put a comment in kexec_should_crash() to explain not obvious things
on this patch.Signed-off-by: HATAYAMA Daisuke
Acked-by: Baoquan He
Tested-by: Hidehiro Kawai
Reviewed-by: Masami Hiramatsu
Cc: Vivek Goyal
Cc: Ingo Molnar
Cc: Hidehiro Kawai
Cc: Baoquan He
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
For compatibility with the behaviour before the commit f06e5153f4ae2e
("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after
panic_notifers"), the 2nd crash_kexec() should be called only if
crash_kexec_post_notifiers is enabled.Note that crash_kexec() returns immediately if kdump crash kernel is not
loaded, so in this case, this patch makes no functionality change, but the
point is to make it explicit, from the caller panic() side, that the 2nd
crash_kexec() does nothing.Signed-off-by: HATAYAMA Daisuke
Suggested-by: Ingo Molnar
Cc: "Eric W. Biederman"
Cc: Vivek Goyal
Cc: Masami Hiramatsu
Cc: Hidehiro Kawai
Cc: Baoquan He
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Dec, 2014
1 commit
-
This adds a new taint flag to indicate when the kernel or a kernel
module has been live patched. This will provide a clean indication in
bug reports that live patching was used.Additionally, if the crash occurs in a live patched function, the live
patch module will appear beside the patched function in the backtrace.Signed-off-by: Seth Jennings
Acked-by: Josh Poimboeuf
Reviewed-by: Miroslav Benes
Reviewed-by: Petr Mladek
Reviewed-by: Masami Hiramatsu
Signed-off-by: Jiri Kosina
11 Dec, 2014
1 commit
-
There have been several times where I have had to rebuild a kernel to
cause a panic when hitting a WARN() in the code in order to get a crash
dump from a system. Sometimes this is easy to do, other times (such as
in the case of a remote admin) it is not trivial to send new images to
the user.A much easier method would be a switch to change the WARN() over to a
panic. This makes debugging easier in that I can now test the actual
image the WARN() was seen on and I do not have to engage in remote
debugging.This patch adds a panic_on_warn kernel parameter and
/proc/sys/kernel/panic_on_warn calls panic() in the
warn_slowpath_common() path. The function will still print out the
location of the warning.An example of the panic_on_warn output:
The first line below is from the WARN_ON() to output the WARN_ON()'s
location. After that the panic() output is displayed.WARNING: CPU: 30 PID: 11698 at /home/prarit/dummy_module/dummy-module.c:25 init_dummy+0x1f/0x30 [dummy_module]()
Kernel panic - not syncing: panic_on_warn set ...CPU: 30 PID: 11698 Comm: insmod Tainted: G W OE 3.17.0+ #57
Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
0000000000000000 000000008e3f87df ffff88080f093c38 ffffffff81665190
0000000000000000 ffffffff818aea3d ffff88080f093cb8 ffffffff8165e2ec
ffffffff00000008 ffff88080f093cc8 ffff88080f093c68 000000008e3f87df
Call Trace:
[] dump_stack+0x46/0x58
[] panic+0xd0/0x204
[] ? init_dummy+0x1f/0x30 [dummy_module]
[] warn_slowpath_common+0xd0/0xd0
[] ? dummy_greetings+0x40/0x40 [dummy_module]
[] warn_slowpath_null+0x1a/0x20
[] init_dummy+0x1f/0x30 [dummy_module]
[] do_one_initcall+0xd4/0x210
[] ? __vunmap+0xc2/0x110
[] load_module+0x16a9/0x1b30
[] ? store_uevent+0x70/0x70
[] ? copy_module_from_fd.isra.44+0x129/0x180
[] SyS_finit_module+0xa6/0xd0
[] system_call_fastpath+0x12/0x17Successfully tested by me.
hpa said: There is another very valid use for this: many operators would
rather a machine shuts down than being potentially compromised either
functionally or security-wise.Signed-off-by: Prarit Bhargava
Cc: Jonathan Corbet
Cc: Rusty Russell
Cc: "H. Peter Anvin"
Cc: Andi Kleen
Cc: Masami Hiramatsu
Acked-by: Yasuaki Ishimatsu
Cc: Fabian Frederick
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Nov, 2014
1 commit
-
Commit 69361eef9056 ("panic: add TAINT_SOFTLOCKUP") added the 'L' flag,
but failed to update the comments for print_tainted(). So, update the
comments.Signed-off-by: Xie XiuQi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Aug, 2014
1 commit
-
This taint flag will be set if the system has ever entered a softlockup
state. Similar to TAINT_WARN it is useful to know whether or not the
system has been in a softlockup state when debugging.[akpm@linux-foundation.org: apply the taint before calling panic()]
Signed-off-by: Josh Hunt
Cc: Jason Baron
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Jun, 2014
1 commit
-
Add a "crash_kexec_post_notifiers" boot option to run kdump after
running panic_notifiers and dump kmsg. This can help rare situations
where kdump fails because of unstable crashed kernel or hardware failure
(memory corruption on critical data/code), or the 2nd kernel is already
broken by the 1st kernel (it's a broken behavior, but who can guarantee
that the "crashed" kernel works correctly?).Usage: add "crash_kexec_post_notifiers" to kernel boot option.
Note that this actually increases risks of the failure of kdump. This
option should be set only if you worry about the rare case of kdump
failure rather than increasing the chance of success.Signed-off-by: Masami Hiramatsu
Acked-by: Motohiro Kosaki
Acked-by: Vivek Goyal
Cc: Eric Biederman
Cc: Yoshihiro YUNOMAE
Cc: Satoru MORIYA
Cc: Tomoki Sekiyama
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Apr, 2014
1 commit
-
Currently, booting without initrd specified on 80x25 screen gives a call
trace followed by atkbd : Spurious ACK. Original message ("VFS: Unable
to mount root fs") is not available. Of course this could happen in
other situations...This patch displays panic reason after call trace which could help lot
of people even if it's not the very last line on screen.Also, convert all panic.c printk(KERN_EMERG to pr_emerg(
[akpm@linux-foundation.org: missed a couple of pr_ conversions]
Signed-off-by: Fabian Frederick
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Apr, 2014
1 commit
-
Pull module updates from Rusty Russell:
"Nothing major: the stricter permissions checking for sysfs broke a
staging driver; fix included. Greg KH said he'd take the patch but
hadn't as the merge window opened, so it's included here to avoid
breaking build"* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
staging: fix up speakup kobject mode
Use 'E' instead of 'X' for unsigned module taint flag.
VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.
kallsyms: fix percpu vars on x86-64 with relocation.
kallsyms: generalize address range checking
module: LLVMLinux: Remove unused function warning from __param_check macro
Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE
module: remove MODULE_GENERIC_TABLE
module: allow multiple calls to MODULE_DEVICE_TABLE() per module
module: use pr_cont
01 Apr, 2014
1 commit
-
Pull x86 LTO changes from Peter Anvin:
"More infrastructure work in preparation for link-time optimization
(LTO). Most of these changes is to make sure symbols accessed from
assembly code are properly marked as visible so the linker doesn't
remove them.My understanding is that the changes to support LTO are still not
upstream in binutils, but are on the way there. This patchset should
conclude the x86-specific changes, and remaining patches to actually
enable LTO will be fed through the Kbuild tree (other than keeping up
with changes to the x86 code base, of course), although not
necessarily in this merge window"* 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
Kbuild, lto: Handle basic LTO in modpost
Kbuild, lto: Disable LTO for asm-offsets.c
Kbuild, lto: Add a gcc-ld script to let run gcc as ld
Kbuild, lto: add ld-version and ld-ifversion macros
Kbuild, lto: Drop .number postfixes in modpost
Kbuild, lto, workaround: Don't warn for initcall_reference in modpost
lto: Disable LTO for sys_ni
lto: Handle LTO common symbols in module loader
lto, workaround: Add workaround for initcall reordering
lto: Make asmlinkage __visible
x86, lto: Disable LTO for the x86 VDSO
initconst, x86: Fix initconst mistake in ts5500 code
initconst: Fix initconst mistake in dcdbas
asmlinkage: Make trace_hardirqs_on/off_caller visible
asmlinkage, x86: Fix 32bit memcpy for LTO
asmlinkage Make __stack_chk_failed and memcmp visible
asmlinkage: Mark rwsem functions that can be called from assembler asmlinkage
asmlinkage: Make main_extable_sort_needed visible
asmlinkage, mutex: Mark __visible
asmlinkage: Make trace_hardirq visible
...
31 Mar, 2014
1 commit
-
Takashi Iwai says:
> The letter 'X' has been already used for SUSE kernels for very long
> time, to indicate the external supported modules. Can the new flag be
> changed to another letter for avoiding conflict...?
> (BTW, we also use 'N' for "no support", too.)Note: this code should be cleaned up, so we don't have such maps in
three places!Signed-off-by: Rusty Russell
21 Mar, 2014
1 commit
-
Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC, so we can repurpose
the flag to encompass a wider range of pushing the CPU beyond its
warrany.Signed-off-by: Dave Jones
Link: http://lkml.kernel.org/r/20140226154949.GA770@redhat.com
Signed-off-by: H. Peter Anvin
13 Mar, 2014
1 commit
-
Users have reported being unable to trace non-signed modules loaded
within a kernel supporting module signature.This is caused by tracepoint.c:tracepoint_module_coming() refusing to
take into account tracepoints sitting within force-loaded modules
(TAINT_FORCED_MODULE). The reason for this check, in the first place, is
that a force-loaded module may have a struct module incompatible with
the layout expected by the kernel, and can thus cause a kernel crash
upon forced load of that module on a kernel with CONFIG_TRACEPOINTS=y.Tracepoints, however, specifically accept TAINT_OOT_MODULE and
TAINT_CRAP, since those modules do not lead to the "very likely system
crash" issue cited above for force-loaded modules.With kernels having CONFIG_MODULE_SIG=y (signed modules), a non-signed
module is tainted re-using the TAINT_FORCED_MODULE taint flag.
Unfortunately, this means that Tracepoints treat that module as a
force-loaded module, and thus silently refuse to consider any tracepoint
within this module.Since an unsigned module does not fit within the "very likely system
crash" category of tainting, add a new TAINT_UNSIGNED_MODULE taint flag
to specifically address this taint behavior, and accept those modules
within Tracepoints. We use the letter 'X' as a taint flag character for
a module being loaded that doesn't know how to sign its name (proposed
by Steven Rostedt).Also add the missing 'O' entry to trace event show_module_flags() list
for the sake of completeness.Signed-off-by: Mathieu Desnoyers
Acked-by: Steven Rostedt
NAKed-by: Ingo Molnar
CC: Thomas Gleixner
CC: David Howells
CC: Greg Kroah-Hartman
Signed-off-by: Rusty Russell
14 Feb, 2014
1 commit
-
In LTO symbols implicitely referenced by the compiler need
to be visible. Earlier these symbols were visible implicitely
from being exported, but we disabled implicit visibility fo
EXPORTs when modules are disabled to improve code size. So
now these symbols have to be marked visible explicitely.Do this for __stack_chk_fail (with stack protector)
and memcmp.Signed-off-by: Andi Kleen
Link: http://lkml.kernel.org/r/1391845930-28580-10-git-send-email-ak@linux.intel.com
Signed-off-by: H. Peter Anvin
26 Nov, 2013
1 commit
-
The panic_timeout value can be set via the command line option
'panic=x', or via /proc/sys/kernel/panic, however that is not
sufficient when the panic occurs before we are able to set up
these values. Thus, add a CONFIG_PANIC_TIMEOUT so that we can
set the desired value from the .config.The default panic_timeout value continues to be 0 - wait
forever. Also adds set_arch_panic_timeout(new_timeout,
arch_default_timeout), which is intended to be used by arches in
arch_setup(). The idea being that the new_timeout is only set if
the user hasn't changed from the arch_default_timeout.Signed-off-by: Jason Baron
Cc: benh@kernel.crashing.org
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: mpe@ellerman.id.au
Cc: felipe.contreras@gmail.com
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/1a1674daec27c534df409697025ac568ebcee91e.1385418410.git.jbaron@akamai.com
Signed-off-by: Ingo Molnar
13 Nov, 2013
1 commit
-
sizeof("Tainted: ") already counts '\0', and after first sprintf(), 's'
will start from the current string end (its' value is '\0').So need not add additional 1 byte for maximized usage of 'buf' in
print_tainted().Signed-off-by: Chen Gang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Sep, 2013
1 commit
-
Since the panic handlers may produce additional information (via printk)
for the kernel log, it should be reported as part of the panic output
saved by kmsg_dump(). Without this re-ordering, nothing that adds
information to a panic will show up in pstore's view when kmsg_dump runs,
and is therefore not visible to crash reporting tools that examine pstore
output.Signed-off-by: Kees Cook
Cc: Anton Vorontsov
Cc: Colin Cross
Acked-by: Tony Luck
Cc: Stephen Boyd
Cc: Vikram Mulukutla
Cc: Peter Zijlstra
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Jul, 2013
1 commit
-
Pull tracing changes from Steven Rostedt:
"The majority of the changes here are cleanups for the large changes
that were added to 3.10, which includes several bug fixes that have
been marked for stable.As for new features, there were a few, but nothing to write to LWN
about. These include:New function trigger called "dump" and "cpudump" that will cause
ftrace to dump its buffer to the console when the function is called.
The difference between "dump" and "cpudump" is that "dump" will dump
the entire contents of the ftrace buffer, where as "cpudump" will only
dump the contents of the ftrace buffer for the CPU that called the
function.Another small enhancement is a new sysctl switch called
"traceoff_on_warning" which, when enabled, will disable tracing if any
WARN_ON() is triggered. This is useful if you want to debug what
caused a warning and do not want to risk losing your trace data by the
ring buffer overwriting the data before you can disable it. There's
also a kernel command line option that will make this enabled at boot
up called the same thing"* tag 'trace-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (34 commits)
tracing: Make tracing_open_generic_{tr,tc}() static
tracing: Remove ftrace() function
tracing: Remove TRACE_EVENT_TYPE enum definition
tracing: Make tracer_tracing_{off,on,is_on}() static
tracing: Fix irqs-off tag display in syscall tracing
uprobes: Fix return value in error handling path
tracing: Fix race between deleting buffer and setting events
tracing: Add trace_array_get/put() to event handling
tracing: Get trace_array ref counts when accessing trace files
tracing: Add trace_array_get/put() to handle instance refs better
tracing: Protect ftrace_trace_arrays list in trace_events.c
tracing: Make trace_marker use the correct per-instance buffer
ftrace: Do not run selftest if command line parameter is set
tracing/kprobes: Don't pass addr=ip to perf_trace_buf_submit()
tracing: Use flag buffer_disabled for irqsoff tracer
tracing/kprobes: Turn trace_probe->files into list_head
tracing: Fix disabling of soft disable
tracing: Add missing syscall_metadata comment
tracing: Simplify code for showing of soft disabled flag
tracing/kprobes: Kill probe_enable_lock
...
10 Jul, 2013
1 commit
-
Add the cpu/pid that called WARN() so that the stack traces can be
matched up with the WARNING messages.[akpm@linux-foundation.org: remove stray quote]
Signed-off-by: Alex Thorlton
Reviewed-by: Robin Holt
Cc: Stephen Boyd
Cc: Vikram Mulukutla
Cc: Rusty Russell
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Jun, 2013
1 commit
-
Add a traceoff_on_warning option in both the kernel command line as well
as a sysctl option. When set, any WARN*() function that is hit will cause
the tracing_on variable to be cleared, which disables writing to the
ring buffer.This is useful especially when tracing a bug with function tracing. When
a warning is hit, the print caused by the warning can flood the trace with
the functions that producing the output for the warning. This can make the
resulting trace useless by either hiding where the bug happened, or worse,
by overflowing the buffer and losing the trace of the bug totally.Acked-by: Peter Zijlstra
Signed-off-by: Steven Rostedt
01 May, 2013
1 commit
-
x86 and ia64 can acquire extra hardware identification information
from DMI and print it along with task dumps; however, the usage isn't
consistent.* x86 show_regs() collects vendor, product and board strings and print
them out with PID, comm and utsname. Some of the information is
printed again later in the same dump.* warn_slowpath_common() explicitly accesses the DMI board and prints
it out with "Hardware name:" label. This applies to both x86 and
ia64 but is irrelevant on all other archs.* ia64 doesn't show DMI information on other non-WARN dumps.
This patch introduces arch-specific hardware description used by
dump_stack(). It can be set by calling dump_stack_set_arch_desc()
during boot and, if exists, printed out in a separate line with
"Hardware name:" label.dmi_set_dump_stack_arch_desc() is added which sets arch-specific
description from DMI data. It uses dmi_ids_string[] which is set from
dmi_present() used for DMI debug message. It is superset of the
information x86 show_regs() is using. The function is called from x86
and ia64 boot code right after dmi_scan_machine().This makes the explicit DMI handling in warn_slowpath_common()
unnecessary. Removed.show_regs() isn't yet converted to use generic debug information
printing and this patch doesn't remove the duplicate DMI handling in
x86 show_regs(). The next patch will unify show_regs() handling and
remove the duplication.An example WARN dump follows.
WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #3
Hardware name: empty empty/S3992, BIOS 080011 10/26/2007
0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
ffffffff8108f500 ffffffff82228240 0000000000000040 ffffffff8234a08e
0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
Call Trace:
[] dump_stack+0x19/0x1b
[] warn_slowpath_common+0x70/0xa0
[] warn_slowpath_null+0x1a/0x20
[] init_workqueues+0x35/0x505
...v2: Use the same string as the debug message from dmi_present() which
also contains BIOS information. Move hardware name into its own
line as warn_slowpath_common() did. This change was suggested by
Bjorn Helgaas.Signed-off-by: Tejun Heo
Cc: Bjorn Helgaas
Cc: David S. Miller
Cc: Fengguang Wu
Cc: Heiko Carstens
Cc: Jesper Nilsson
Cc: Martin Schwidefsky
Cc: Mike Frysinger
Cc: Vineet Gupta
Cc: Sam Ravnborg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
21 Jan, 2013
1 commit
-
Fix up all callers as they were before, with make one change: an
unsigned module taints the kernel, but doesn't turn off lockdep.Signed-off-by: Rusty Russell