13 Jul, 2017
1 commit
-
As Eric said,
"what we need to do is move the variable vmcoreinfo_note out of the
kernel's .bss section. And modify the code to regenerate and keep this
information in something like the control page.Definitely something like this needs a page all to itself, and ideally
far away from any other kernel data structures. I clearly was not
watching closely the data someone decided to keep this silly thing in
the kernel's .bss section."This patch allocates extra pages for these vmcoreinfo_XXX variables, one
advantage is that it enhances some safety of vmcoreinfo, because
vmcoreinfo now is kept far away from other kernel data structures.Link: http://lkml.kernel.org/r/1493281021-20737-1-git-send-email-xlpang@redhat.com
Signed-off-by: Xunlei Pang
Tested-by: Michael Holzheu
Reviewed-by: Juergen Gross
Suggested-by: Eric Biederman
Cc: Benjamin Herrenschmidt
Cc: Dave Young
Cc: Hari Bathini
Cc: Mahesh Salgaonkar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Jan, 2017
2 commits
-
We've got a number of defines related to the E820 table and its size:
E820MAP
E820NR
E820_X_MAX
E820MAXThe first two denote byte offsets into the zeropage (struct boot_params),
and can are not used in the kernel and can be removed.The E820_*_MAX values have an inconsistent structure and it's unclear in any
case what they mean. 'X' presuably goes for extended - but it's not very
expressive altogether.Change these over to:
E820_MAX_ENTRIES_ZEROPAGE
E820_MAX_ENTRIES... which are self-explanatory names.
No change in functionality.
Cc: Alex Thorlton
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dan Williams
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Huang, Ying
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Paul Jackson
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Wei Yang
Cc: Yinghai Lu
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar -
So there's a number of constants that start with "E820" but which
are not types - these create a confusing mixture when seen together
with 'enum e820_type' values:E820MAP
E820NR
E820_X_MAX
E820MAXTo better differentiate the 'enum e820_type' values prefix them
with E820_TYPE_.No change in functionality.
Cc: Alex Thorlton
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dan Williams
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Huang, Ying
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Paul Jackson
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Wei Yang
Cc: Yinghai Lu
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar
28 Jan, 2017
4 commits
-
No change in functionality.
Cc: Alex Thorlton
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dan Williams
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Huang, Ying
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Paul Jackson
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Wei Yang
Cc: Yinghai Lu
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar -
In line with the rename to 'struct e820_array', harmonize the naming of common e820
table variable names as well:e820 => e820_array
e820_saved => e820_array_saved
e820_map => e820_array
initial_e820 => e820_array_initThis makes the variable names more consistent and easier to grep for.
No change in functionality.
Cc: Alex Thorlton
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dan Williams
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Huang, Ying
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Paul Jackson
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Wei Yang
Cc: Yinghai Lu
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar -
The 'e820entry' and 'e820map' names have various annoyances:
- the missing underscore departs from the usual kernel style
and makes the code look weird,- in the past I kept confusing the 'map' with the 'entry', because
a 'map' is ambiguous in that regard,- it's not really clear from the 'e820map' that this is a regular
C array.Rename them to 'struct e820_entry' and 'struct e820_array' accordingly.
( Leave the legacy UAPI header alone but do the rename in the bootparam.h
and e820/types.h file - outside tools relying on these defines should
either adjust their code, or should use the legacy header, or should
create their private copies for the definitions. )No change in functionality.
Cc: Alex Thorlton
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dan Williams
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Huang, Ying
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Paul Jackson
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Wei Yang
Cc: Yinghai Lu
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar -
A commonly used lowlevel x86 header, asm/pgtable.h, includes asm/e820/api.h
spuriously, without making direct use of it.Removing it is not simple: over the years various .c code learned to rely
on this indirect inclusion.Remove the unnecessary include - this should speed up the kernel build a bit,
as a large header is not included anymore in totally unrelated code.Cc: Alex Thorlton
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Dan Williams
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Huang, Ying
Cc: Josh Poimboeuf
Cc: Juergen Gross
Cc: Linus Torvalds
Cc: Paul Jackson
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Tejun Heo
Cc: Thomas Gleixner
Cc: Wei Yang
Cc: Yinghai Lu
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar
30 Nov, 2016
1 commit
-
This is done to simplify the kexec_add_buffer argument list.
Adapt all callers to set up a kexec_buf to pass to kexec_add_buffer.In addition, change the type of kexec_buf.buffer from char * to void *.
There is no particular reason for it to be a char *, and the change
allows us to get rid of 3 existing casts to char * in the code.Signed-off-by: Thiago Jung Bauermann
Acked-by: Dave Young
Acked-by: Balbir Singh
Signed-off-by: Michael Ellerman
12 Oct, 2016
1 commit
-
Daniel Walker reported problems which happens when
crash_kexec_post_notifiers kernel option is enabled
(https://lkml.org/lkml/2015/6/24/44).In that case, smp_send_stop() is called before entering kdump routines
which assume other CPUs are still online. As the result, for x86, kdump
routines fail to save other CPUs' registers and disable virtualization
extensions.To fix this problem, call a new kdump friendly function,
crash_smp_send_stop(), instead of the smp_send_stop() when
crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a weak
function, and it just call smp_send_stop(). Architecture codes should
override it so that kdump can work appropriately. This patch only
provides x86-specific version.For Xen's PV kernel, just keep the current behavior.
NOTES:
- Right solution would be to place crash_smp_send_stop() before
__crash_kexec() invocation in all cases and remove smp_send_stop(), but
we can't do that until all architectures implement own
crash_smp_send_stop()- crash_smp_send_stop()-like work is still needed by
machine_crash_shutdown() because crash_kexec() can be called without
entering panic()Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
Link: http://lkml.kernel.org/r/20160810080948.11028.15344.stgit@sysi4-13.yrl.intra.hitachi.co.jp
Signed-off-by: Hidehiro Kawai
Reported-by: Daniel Walker
Cc: Dave Young
Cc: Baoquan He
Cc: Vivek Goyal
Cc: Eric Biederman
Cc: Masami Hiramatsu
Cc: Daniel Walker
Cc: Xunlei Pang
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Borislav Petkov
Cc: David Vrabel
Cc: Toshi Kani
Cc: Ralf Baechle
Cc: David Daney
Cc: Aaro Koskinen
Cc: "Steven J. Hill"
Cc: Corey Minyard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Jul, 2016
1 commit
-
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends. That changed
when we forked out support for the latter into the export.h file.This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig. The advantage
in doing so is that module.h itself sources about 15 other headers;
adding significantly to what we feed cpp, and it can obscure what
headers we are effectively using.Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
for the presence of either and replace as needed. Build testing
revealed some implicit header usage that was fixed up accordingly.Note that some bool/obj-y instances remain since module.h is
the header for some exception table entry stuff, and for things
like __init_or_module (code that is tossed when MODULES=n).Signed-off-by: Paul Gortmaker
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20160714001901.31603-4-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar
30 Jan, 2016
2 commits
-
There is no longer any driver inserting a "GART" region in the
kernel since707d4eefbdb3 ("Revert "[PATCH] Insert GART region into resource map"").
Remove the call to walk_iomem_res() with "GART" type, its
callback function, and GART-specific variables set by the
callback.Signed-off-by: Toshi Kani
Signed-off-by: Borislav Petkov
Reviewed-by: Dave Young
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Chun-Yi
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Lee, Chun-Yi
Cc: Linus Torvalds
Cc: Luis R. Rodriguez
Cc: Minfei Huang
Cc: Peter Zijlstra (Intel)
Cc: Stephen Rothwell
Cc: Takao Indoh
Cc: Thomas Gleixner
Cc: Toshi Kani
Cc: Viresh Kumar
Cc: kexec@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mm
Link: http://lkml.kernel.org/r/1453841853-11383-16-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar -
Change the callers of walk_iomem_res() scanning for the
following resources by name to use walk_iomem_res_desc()
instead."ACPI Tables"
"ACPI Non-volatile Storage"
"Persistent Memory (legacy)"
"Crash kernel"Note, the caller of walk_iomem_res() with "GART" will be removed
in a later patch.Signed-off-by: Toshi Kani
Signed-off-by: Borislav Petkov
Reviewed-by: Dave Young
Cc: Andrew Morton
Cc: Andy Lutomirski
Cc: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Chun-Yi
Cc: Dan Williams
Cc: Denys Vlasenko
Cc: Don Zickus
Cc: H. Peter Anvin
Cc: Lee, Chun-Yi
Cc: Linus Torvalds
Cc: Luis R. Rodriguez
Cc: Minfei Huang
Cc: Peter Zijlstra (Intel)
Cc: Ross Zwisler
Cc: Stephen Rothwell
Cc: Takao Indoh
Cc: Thomas Gleixner
Cc: Toshi Kani
Cc: kexec@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mm
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/1453841853-11383-15-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar
23 Nov, 2015
1 commit
-
This patch stops Intel PT logging and saves its registers in memory
before kdump is started. This feature is needed to prevent Intel PT from
overwriting its log buffer after panic, and saved registers are needed to
find the last position where Intel PT wrote data.After the crash dump is captured by kdump, users can retrieve the log buffer
from the vmcore and use it to investigate bad kernel behavior.Signed-off-by: Takao Indoh
Signed-off-by: Peter Zijlstra (Intel)
Cc: Alexander Shishkin
Cc: Arnaldo Carvalho de Melo
Cc: Arnaldo Carvalho de Melo
Cc: H.Peter Anvin
Cc: Jiri Olsa
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Stephane Eranian
Cc: Thomas Gleixner
Cc: Vince Weaver
Cc: Vivek Goyal
Link: http://lkml.kernel.org/r/1446614553-6072-3-git-send-email-indou.takao@jp.fujitsu.com
Signed-off-by: Ingo Molnar
12 Oct, 2015
1 commit
-
Previously, UV NMI used the 'in_crash_kexec' flag to determine whether
we are in a kdump kernel or not:5edd19af18a36a4 ("x86, UV: Make kdump avoid stack dumps")
But this flags was removed in the following commit:
9c48f1c629ecfa1 ("x86, nmi: Wire up NMI handlers to new routines")
Since it isn't used any more, remove it.
Signed-off-by: Minfei Huang
Acked-by: Don Zickus
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: cpw@sgi.com
Cc: kexec@lists.infradead.org
Cc: mhuang@redhat.com
Link: http://lkml.kernel.org/r/1444070155-17934-1-git-send-email-mhuang@redhat.com
Signed-off-by: Ingo Molnar
02 Oct, 2015
1 commit
-
The original bug is a page fault crash that sometimes happens
on big machines when preparing ELF headers:BUG: unable to handle kernel paging request at ffffc90613fc9000
IP: [] prepare_elf64_ram_headers_callback+0x165/0x260The bug is caused by us under-counting the number of memory ranges
and subsequently not allocating enough ELF header space for them.
The bug is typically masked on smaller systems, because the ELF header
allocation is rounded up to the next page.This patch modifies the code in fill_up_crash_elf_data() by using
walk_system_ram_res() instead of walk_system_ram_range() to correctly
count the max number of crash memory ranges. That's because the
walk_system_ram_range() filters out small memory regions that
reside in the same page, but walk_system_ram_res() does not.Here's how I found the bug:
After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(),
the code uses walk_system_ram_res() to fill-in crash memory regions information
to the program header, so it counts those small memory regions that
reside in a page area.But, when the kernel was using walk_system_ram_range() in
fill_up_crash_elf_data() to count the number of crash memory regions,
it filters out small regions.I printed those small memory regions, for example:
kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0
Based on the code in walk_system_ram_range(), this memory region
will be filtered out:pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
end_pfn - pfn = 0x77593 - 0x77593 = 0 pfn) is FALSESo, the max_nr_ranges that's counted by the kernel doesn't include
small memory regions - causing us to under-allocate the required space.
That causes the page fault crash that happens in a later code path
when preparing ELF headers.This bug is not easy to reproduce on small machines that have few
CPUs, because the allocated page aligned ELF buffer has more free
space to cover those small memory regions' PT_LOAD headers.Signed-off-by: Lee, Chun-Yi
Cc: Andy Lutomirski
Cc: Baoquan He
Cc: Jiang Liu
Cc: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Stephen Rothwell
Cc: Takashi Iwai
Cc: Thomas Gleixner
Cc: Viresh Kumar
Cc: Vivek Goyal
Cc: kexec@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc:
Link: http://lkml.kernel.org/r/1443531537-29436-1-git-send-email-jlee@suse.com
Signed-off-by: Ingo Molnar
03 Jun, 2015
1 commit
-
Nothing in uses anything from , so
remove it from there and fix up the resulting build problems
triggered on x86 {64|32}-bit {def|allmod|allno}configs.The breakages were triggering in places where x86 builds relied
on vmalloc() facilities but did not include
explicitly and relied on the implicit inclusion via .Also add:
- to
- to... which were two other implicit header file dependencies.
Suggested-by: David Miller
Signed-off-by: Stephen Rothwell
[ Tidied up the changelog. ]
Acked-by: David Miller
Acked-by: Takashi Iwai
Acked-by: Viresh Kumar
Acked-by: Vinod Koul
Cc: Andrew Morton
Cc: Anton Vorontsov
Cc: Boris Ostrovsky
Cc: Colin Cross
Cc: David Vrabel
Cc: H. Peter Anvin
Cc: Haiyang Zhang
Cc: James E.J. Bottomley
Cc: Jaroslav Kysela
Cc: K. Y. Srinivasan
Cc: Kees Cook
Cc: Konrad Rzeszutek Wilk
Cc: Kristen Carlson Accardi
Cc: Len Brown
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Rafael J. Wysocki
Cc: Suma Ramars
Cc: Thomas Gleixner
Cc: Tony Luck
Signed-off-by: Ingo Molnar
23 Mar, 2015
1 commit
-
user_mode_vm() and user_mode() are now the same. Change all callers
of user_mode_vm() to user_mode().The next patch will remove the definition of user_mode_vm.
Signed-off-by: Andy Lutomirski
Cc: Borislav Petkov
Cc: Brad Spengler
Cc: Denys Vlasenko
Cc: H. Peter Anvin
Cc: Linus Torvalds
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/43b1f57f3df70df5a08b0925897c660725015554.1426728647.git.luto@kernel.org
[ Merged to a more recent kernel. ]
Signed-off-by: Ingo Molnar
16 Dec, 2014
1 commit
-
Clean up code by moving IOAPIC related declarations from hw_irq.h into
io_apic.h.Signed-off-by: Jiang Liu
Cc: Konrad Rzeszutek Wilk
Cc: Tony Luck
Cc: Joerg Roedel
Cc: Greg Kroah-Hartman
Cc: H. Peter Anvin
Cc: Benjamin Herrenschmidt
Cc: Rafael J. Wysocki
Cc: Bjorn Helgaas
Cc: Randy Dunlap
Cc: Yinghai Lu
Cc: Borislav Petkov
Cc: Prarit Bhargava
Cc: Grant Likely
Cc: Vivek Goyal
Cc: Baoquan He
Cc: Matt Fleming
Cc: Fenghua Yu
Cc: Christian Gmeiner
Cc: Aubrey
Cc: Ryan Desfosses
Cc: Quentin Lambert
Cc: Rafael J. Wysocki
Link: http://lkml.kernel.org/r/1414397531-28254-14-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner
14 Oct, 2014
1 commit
-
Add a check if crashk_res_low exists just like GART region does. If
crashk_res_low doesn't exist, calling exclude_mem_range is unnecessary.Meanwhile, since crashk_res_low has been initialized at definition, it's
safe just use "if (crashk_low_res.end)" to check if it's exist. And this
can make it consistent with other places of check.Signed-off-by: Baoquan He
Acked-by: Vivek Goyal
Cc: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
30 Aug, 2014
1 commit
-
Currently new system call kexec_file_load() and all the associated code
compiles if CONFIG_KEXEC=y. But new syscall also compiles purgatory
code which currently uses gcc option -mcmodel=large. This option seems
to be available only gcc 4.4 onwards.Hiding new functionality behind a new config option will not break
existing users of old gcc. Those who wish to enable new functionality
will require new gcc. Having said that, I am trying to figure out how
can I move away from using -mcmodel=large but that can take a while.I think there are other advantages of introducing this new config
option. As this option will be enabled only on x86_64, other arches
don't have to compile generic kexec code which will never be used. This
new code selects CRYPTO=y and CRYPTO_SHA256=y. And all other arches had
to do this for CONFIG_KEXEC. Now with introduction of new config
option, we can remove crypto dependency from other arches.Now CONFIG_KEXEC_FILE is available only on x86_64. So whereever I had
CONFIG_X86_64 defined, I got rid of that.For CONFIG_KEXEC_FILE, instead of doing select CRYPTO=y, I changed it to
"depends on CRYPTO=y". This should be safer as "select" is not
recursive.Signed-off-by: Vivek Goyal
Cc: Eric Biederman
Cc: H. Peter Anvin
Tested-by: Shaun Ruffell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Aug, 2014
1 commit
-
This patch adds support for loading a kexec on panic (kdump) kernel usning
new system call.It prepares ELF headers for memory areas to be dumped and for saved cpu
registers. Also prepares the memory map for second kernel and limits its
boot to reserved areas only.Signed-off-by: Vivek Goyal
Cc: Borislav Petkov
Cc: Michael Kerrisk
Cc: Yinghai Lu
Cc: Eric Biederman
Cc: H. Peter Anvin
Cc: Matthew Garrett
Cc: Greg Kroah-Hartman
Cc: Dave Young
Cc: WANG Chao
Cc: Baoquan He
Cc: Andy Lutomirski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Mar, 2014
1 commit
-
Merge two back-to-back CONFIG_X86_32 ifdefs into one.
Signed-off-by: Borislav Petkov
Link: http://lkml.kernel.org/r/1394633584-5509-3-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin
07 Jan, 2014
1 commit
-
None of these files are actually using any __init type directives
and hence don't need to include . Most are just a
left over from __devinit and __cpuinit removal, or simply due to
code getting copied from one driver to the next.[ hpa: undid incorrect removal from arch/x86/kernel/head_32.S ]
Signed-off-by: Paul Gortmaker
Link: http://lkml.kernel.org/r/1389054026-12947-1-git-send-email-paul.gortmaker@windriver.com
Signed-off-by: H. Peter Anvin
07 Nov, 2013
1 commit
-
In reboot and crash path, when we shut down the local APIC, the I/O APIC is
still active. This may cause issues because external interrupts
can still come in and disturb the local APIC during shutdown process.To quiet external interrupts, disable I/O APIC before shutdown local APIC.
Signed-off-by: Fenghua Yu
Link: http://lkml.kernel.org/r/1382578212-4677-1-git-send-email-fenghua.yu@intel.com
Cc:
[ I suppose the 'issue' is a hang during shutdown. It's a fine change nevertheless. ]
Signed-off-by: Ingo Molnar
20 Aug, 2013
1 commit
-
Prevent crash_kexec() from deadlocking on ioapic_lock. When
crash_kexec() is executed on a CPU, the CPU will take ioapic_lock
in disable_IO_APIC(). So if the cpu gets an NMI while locking
ioapic_lock, a deadlock will happen.In this patch, ioapic_lock is zapped/initialized before disable_IO_APIC().
You can reproduce this deadlock the following way:
1. Add mdelay(1000) after raw_spin_lock_irqsave() in
native_ioapic_set_affinity()@arch/x86/kernel/apic/io_apic.cAlthough the deadlock can occur without this modification, it will increase
the potential of the deadlock problem.2. Build and install the kernel
3. Set up the OS which will run panic() and kexec when NMI is injected
# echo "kernel.unknown_nmi_panic=1" >> /etc/sysctl.conf
# vim /etc/default/grub
add "nmi_watchdog=0 crashkernel=256M" in GRUB_CMDLINE_LINUX line
# grub2-mkconfig4. Reboot the OS
5. Run following command for each vcpu on the guest
# while true; do echo > /proc/irq//smp_affinitity; done;
By running this command, cpus will get ioapic_lock for setting affinity.6. Inject NMI (push a dump button or execute 'virsh inject-nmi ' if you
use VM). After injecting NMI, panic() is called in an nmi-handler context.
Then, kexec will normally run in panic(), but the operation will be stopped
by deadlock on ioapic_lock in crash_kexec()->machine_crash_shutdown()->
native_machine_crash_shutdown()->disable_IO_APIC()->clear_IO_APIC()->
clear_IO_APIC_pin()->ioapic_read_entry().Signed-off-by: Yoshihiro YUNOMAE
Cc: Andi Kleen
Cc: Gleb Natapov
Cc: Konrad Rzeszutek Wilk
Cc: Joerg Roedel
Cc: Marcelo Tosatti
Cc: Hidehiro Kawai
Cc: Sebastian Andrzej Siewior
Cc: Zhang Yanfei
Cc: Eric W. Biederman
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Masami Hiramatsu
Cc: Seiji Aguchi
Link: http://lkml.kernel.org/r/20130820070107.28245.83806.stgit@yunodevel
Signed-off-by: Ingo Molnar
12 Dec, 2012
1 commit
-
This removes the sparse warning:
arch/x86/kernel/crash.c:49:32: sparse: incompatible types in comparison expression (different address spaces)Reported-by: kbuild test robot
Signed-off-by: Zhang Yanfei
Signed-off-by: Marcelo Tosatti
07 Dec, 2012
1 commit
-
This patch provides a way to VMCLEAR VMCSs related to guests
on all cpus before executing the VMXOFF when doing kdump. This
is used to ensure the VMCSs in the vmcore updated and
non-corrupted.Signed-off-by: Zhang Yanfei
Acked-by: Eric W. Biederman
Signed-off-by: Gleb Natapov
10 Oct, 2011
1 commit
-
Just convert all the files that have an nmi handler to the new routines.
Most of it is straight forward conversion. A couple of places needed some
tweaking like kgdb which separates the debug notifier from the nmi handler
and mce removes a call to notify_die.[Thanks to Ying for finding out the history behind that mce call
https://lkml.org/lkml/2010/5/27/114
And Boris responding that he would like to remove that call because of it
https://lkml.org/lkml/2011/9/21/163]
The things that get converted are the registeration/unregistration routines
and the nmi handler itself has its args changed along with code removal
to check which list it is on (most are on one NMI list except for kgdb
which has both an NMI routine and an NMI Unknown routine).Signed-off-by: Don Zickus
Signed-off-by: Peter Zijlstra
Acked-by: Corey Minyard
Cc: Jason Wessel
Cc: Andi Kleen
Cc: Robert Richter
Cc: Huang Ying
Cc: Corey Minyard
Cc: Jack Steiner
Link: http://lkml.kernel.org/r/1317409584-23662-4-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar
22 Jul, 2010
1 commit
-
UV NMI callback's should not write stack dumps when a kdump is to be written.
When invoking the crash kernel to write a dump, kdump_nmi_shootdown_cpus()
uses NMI's to get all the cpu's to save their register context and halt.But the NMI interrupt handler runs a callback list. This patch sets a flag
to prevent any of those callbacks from interfering with the halt of the cpu.For UV, which currently has the only callback to which this is relevant, the
uv_handle_nmi() callback should not do dumping of stacks.The 'in_crash_kexec' flag is defined as an extern in kdebug.h firstly
because x2apic_uv_x.c includes it. Secondly because some future callback
might need the flag to know that it should not enter the debugger.
(Such a scenario was in fact present in the 2.6.32 kernel, SuSE distribution,
where a call to kdb needed to be avoided.)Signed-off-by: Cliff Wickman
LKML-Reference:
Signed-off-by: H. Peter Anvin
07 Apr, 2010
1 commit
-
This effectively reverts commit 61d047be99757fd9b0af900d7abce9a13a337488.
Disabling the IOMMU can potetially allow DMA transactions to
complete without being translated. Leave it enabled, and allow
crash kernel to do the IOMMU reinitialization properly.Cc: stable@kernel.org
Cc: Joerg Roedel
Cc: Eric Biederman
Cc: Neil Horman
Cc: Vivek Goyal
Signed-off-by: Chris Wright
Signed-off-by: Joerg Roedel
08 Nov, 2009
1 commit
-
This patch cleans up pci_iommu_shutdown() a bit to use
x86_platform (similar to how IA64 initializes an IOMMU driver).This adds iommu_shutdown() to x86_platform to avoid calling
every IOMMUs' shutdown functions in pci_iommu_shutdown() in
order. The IOMMU shutdown functions are platform specific (we
don't have multiple different IOMMU hardware) so the current way
is pointless.An IOMMU driver sets x86_platform.iommu_shutdown to the shutdown
function if necessary.Signed-off-by: FUJITA Tomonori
Cc: joerg.roedel@amd.com
LKML-Reference:
Signed-off-by: Ingo Molnar
15 Jun, 2009
1 commit
-
If the IOMMUs are still enabled when the kexec kernel boots access to
the disk is not possible. This is bad for tools like kdump or anything
else which wants to use PCI devices.Signed-off-by: Joerg Roedel
18 Feb, 2009
2 commits
-
Impact: cleanup
Signed-off-by: Ingo Molnar
-
Impact: cleanup
Remove genapic.h and remove all references to it.
Signed-off-by: Ingo Molnar
29 Jan, 2009
1 commit
-
Move mach_ipi.h definitions into genapic.h.
Signed-off-by: Ingo Molnar
08 Jan, 2009
1 commit
-
Impact: cleanup
Signed-off-by: Jaswinder Singh Rajput
Signed-off-by: Ingo Molnar
31 Dec, 2008
1 commit
-
We need to disable virtualization extensions on all CPUs before booting
the kdump kernel, otherwise the kdump kernel booting will fail, and
rebooting after the kdump kernel did its task may also fail.We do it using cpu_emergency_vmxoff() and cpu_emergency_svm_disable(),
that should always work, because those functions check if the CPUs
support SVM or VMX before doing their tasks.Signed-off-by: Eduardo Habkost
Signed-off-by: Avi Kivity
13 Nov, 2008
3 commits
-
Impact: make nmi_shootdown_cpus() available to the rest of the x86 platform
Now nmi_shootdown_cpus() is ready to be used by non-kdump code also.
Move it to reboot.c.Signed-off-by: Eduardo Habkost
Signed-off-by: Ingo Molnar -
Impact: make API available to the rest of x86 platform code
Add prototype to asm/reboot.h.
Signed-off-by: Eduardo Habkost
Signed-off-by: Ingo Molnar -
Impact: extend nmi_shootdown_cpus() with a callback
The reboot code will use a different function on crash_nmi_callback().
Adding a function pointer parameter to nmi_shootdown_cpus() for that.Signed-off-by: Eduardo Habkost
Signed-off-by: Ingo Molnar