17 Oct, 2020

1 commit

  • Fix multiple occurrences of duplicated words in kernel/.

    Fix one typo/spello on the same line as a duplicate word. Change one
    instance of "the the" to "that the". Otherwise just drop one of the
    repeated words.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Link: https://lkml.kernel.org/r/98202fa6-8919-ef63-9efe-c0fad5ca7af1@infradead.org
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

10 Sep, 2020

1 commit


09 Jan, 2020

2 commits

  • It is the same as machine_kexec_prepare(), but is called after segments are
    loaded. This way, can do processing work with already loaded relocation
    segments. One such example is arm64: it has to have segments loaded in
    order to create a page table, but it cannot do it during kexec time,
    because at that time allocations won't be possible anymore.

    Signed-off-by: Pavel Tatashin
    Acked-by: Dave Young
    Signed-off-by: Will Deacon

    Pavel Tatashin
     
  • Here is a regular kexec command sequence and output:
    =====
    $ kexec --reuse-cmdline -i --load Image
    $ kexec -e
    [ 161.342002] kexec_core: Starting new kernel

    Welcome to Buildroot
    buildroot login:
    =====

    Even when "quiet" kernel parameter is specified, "kexec_core: Starting
    new kernel" is printed.

    This message has KERN_EMERG level, but there is no emergency, it is a
    normal kexec operation, so quiet it down to appropriate KERN_NOTICE.

    Machines that have slow console baud rate benefit from less output.

    Signed-off-by: Pavel Tatashin
    Reviewed-by: Simon Horman
    Acked-by: Dave Young
    Signed-off-by: Will Deacon

    Pavel Tatashin
     

26 Sep, 2019

1 commit

  • syzbot found that a thread can stall for minutes inside kexec_load() after
    that thread was killed by SIGKILL [1]. It turned out that the reproducer
    was trying to allocate 2408MB of memory using kimage_alloc_page() from
    kimage_load_normal_segment(). Let's check for SIGKILL before doing memory
    allocation.

    [1] https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f5e

    Link: http://lkml.kernel.org/r/993c9185-d324-2640-d061-bed2dd18b1f7@I-love.SAKURA.ne.jp
    Signed-off-by: Tetsuo Handa
    Reported-by: syzbot
    Cc: Eric Biederman
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     

19 Jun, 2019

1 commit

  • Based on 2 normalized pattern(s):

    this source code is licensed under the gnu general public license
    version 2 see the file copying for more details

    this source code is licensed under general public license version 2
    see

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 52 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Enrico Weigelt
    Reviewed-by: Allison Randal
    Reviewed-by: Alexios Zavras
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190602204653.449021192@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

04 May, 2019

1 commit

  • This adds a function to disable secondary CPUs for suspend that are
    not necessarily non-zero / non-boot CPUs. Platforms will be able to
    use this to suspend using non-zero CPUs.

    Signed-off-by: Nicholas Piggin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rafael J . Wysocki
    Cc: Thomas Gleixner
    Cc: linuxppc-dev@lists.ozlabs.org
    Link: https://lkml.kernel.org/r/20190411033448.20842-3-npiggin@gmail.com
    Signed-off-by: Ingo Molnar

    Nicholas Piggin
     

29 Dec, 2018

2 commits

  • totalram_pages and totalhigh_pages are made static inline function.

    Main motivation was that managed_page_count_lock handling was complicating
    things. It was discussed in length here,
    https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
    better to remove the lock and convert variables to atomic, with preventing
    poteintial store-to-read tearing as a bonus.

    [akpm@linux-foundation.org: coding style fixes]
    Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
    Signed-off-by: Arun KS
    Suggested-by: Michal Hocko
    Suggested-by: Vlastimil Babka
    Reviewed-by: Konstantin Khlebnikov
    Reviewed-by: Pavel Tatashin
    Acked-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: David Hildenbrand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun KS
     
  • Patch series "mm: convert totalram_pages, totalhigh_pages and managed
    pages to atomic", v5.

    This series converts totalram_pages, totalhigh_pages and
    zone->managed_pages to atomic variables.

    totalram_pages, zone->managed_pages and totalhigh_pages updates are
    protected by managed_page_count_lock, but readers never care about it.
    Convert these variables to atomic to avoid readers potentially seeing a
    store tear.

    Main motivation was that managed_page_count_lock handling was complicating
    things. It was discussed in length here,
    https://lore.kernel.org/patchwork/patch/995739/#1181785 It seemes better
    to remove the lock and convert variables to atomic. With the change,
    preventing poteintial store-to-read tearing comes as a bonus.

    This patch (of 4):

    This is in preparation to a later patch which converts totalram_pages and
    zone->managed_pages to atomic variables. Please note that re-reading the
    value might lead to a different value and as such it could lead to
    unexpected behavior. There are no known bugs as a result of the current
    code but it is better to prevent from them in principle.

    Link: http://lkml.kernel.org/r/1542090790-21750-2-git-send-email-arunks@codeaurora.org
    Signed-off-by: Arun KS
    Reviewed-by: Konstantin Khlebnikov
    Reviewed-by: David Hildenbrand
    Acked-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Reviewed-by: Pavel Tatashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun KS
     

06 Oct, 2018

1 commit

  • When SME is enabled in the first kernel, it needs to allocate decrypted
    pages for kdump because when the kdump kernel boots, these pages need to
    be accessed decrypted in the initial boot stage, before SME is enabled.

    [ bp: clean up text. ]

    Signed-off-by: Lianbo Jiang
    Signed-off-by: Borislav Petkov
    Reviewed-by: Tom Lendacky
    Cc: kexec@lists.infradead.org
    Cc: tglx@linutronix.de
    Cc: mingo@redhat.com
    Cc: hpa@zytor.com
    Cc: akpm@linux-foundation.org
    Cc: dan.j.williams@intel.com
    Cc: bhelgaas@google.com
    Cc: baiyaowei@cmss.chinamobile.com
    Cc: tiwai@suse.de
    Cc: brijesh.singh@amd.com
    Cc: dyoung@redhat.com
    Cc: bhe@redhat.com
    Cc: jroedel@suse.de
    Link: https://lkml.kernel.org/r/20180930031033.22110-3-lijiang@redhat.com

    Lianbo Jiang
     

15 Jun, 2018

1 commit

  • Without yielding while loading kimage segments, a large initrd will
    block all other work on the CPU performing the load until it is
    completed. For example loading an initrd of 200MB on a low power single
    core system will lock up the system for a few seconds.

    To increase system responsiveness to other tasks at that time, call
    cond_resched() in both the crash kernel and normal kernel segment
    loading loops.

    I did run into a practical problem. Hardware watchdogs on embedded
    systems can have short timers on the order of seconds. If the system is
    locked up for a few seconds with only a single core available, the
    watchdog may not be pet in a timely fashion. If this happens, the
    hardware watchdog will fire and reset the system.

    This really only becomes a problem when you are working with a single
    core, a decently sized initrd, and have a constrained hardware watchdog.

    Link: http://lkml.kernel.org/r/1528738546-3328-1-git-send-email-jmf@amazon.com
    Signed-off-by: Jarrett Farnitano
    Reviewed-by: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jarrett Farnitano
     

18 Jul, 2017

1 commit

  • Provide support so that kexec can be used to boot a kernel when SME is
    enabled.

    Support is needed to allocate pages for kexec without encryption. This
    is needed in order to be able to reboot in the kernel in the same manner
    as originally booted.

    Additionally, when shutting down all of the CPUs we need to be sure to
    flush the caches and then halt. This is needed when booting from a state
    where SME was not active into a state where SME is active (or vice-versa).
    Without these steps, it is possible for cache lines to exist for the same
    physical location but tagged both with and without the encryption bit. This
    can cause random memory corruption when caches are flushed depending on
    which cacheline is written last.

    Signed-off-by: Tom Lendacky
    Reviewed-by: Thomas Gleixner
    Reviewed-by: Borislav Petkov
    Cc:
    Cc: Alexander Potapenko
    Cc: Andrey Ryabinin
    Cc: Andy Lutomirski
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brijesh Singh
    Cc: Dave Young
    Cc: Dmitry Vyukov
    Cc: Jonathan Corbet
    Cc: Konrad Rzeszutek Wilk
    Cc: Larry Woodman
    Cc: Linus Torvalds
    Cc: Matt Fleming
    Cc: Michael S. Tsirkin
    Cc: Paolo Bonzini
    Cc: Peter Zijlstra
    Cc: Radim Krčmář
    Cc: Rik van Riel
    Cc: Toshimitsu Kani
    Cc: kasan-dev@googlegroups.com
    Cc: kvm@vger.kernel.org
    Cc: linux-arch@vger.kernel.org
    Cc: linux-doc@vger.kernel.org
    Cc: linux-efi@vger.kernel.org
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/b95ff075db3e7cd545313f2fb609a49619a09625.1500319216.git.thomas.lendacky@amd.com
    Signed-off-by: Ingo Molnar

    Tom Lendacky
     

13 Jul, 2017

1 commit

  • Currently vmcoreinfo data is updated at boot time subsys_initcall(), it
    has the risk of being modified by some wrong code during system is
    running.

    As a result, vmcore dumped may contain the wrong vmcoreinfo. Later on,
    when using "crash", "makedumpfile", etc utility to parse this vmcore, we
    probably will get "Segmentation fault" or other unexpected errors.

    E.g. 1) wrong code overwrites vmcoreinfo_data; 2) further crashes the
    system; 3) trigger kdump, then we obviously will fail to recognize the
    crash context correctly due to the corrupted vmcoreinfo.

    Now except for vmcoreinfo, all the crash data is well
    protected(including the cpu note which is fully updated in the crash
    path, thus its correctness is guaranteed). Given that vmcoreinfo data
    is a large chunk prepared for kdump, we better protect it as well.

    To solve this, we relocate and copy vmcoreinfo_data to the crash memory
    when kdump is loading via kexec syscalls. Because the whole crash
    memory will be protected by existing arch_kexec_protect_crashkres()
    mechanism, we naturally protect vmcoreinfo_data from write(even read)
    access under kernel direct mapping after kdump is loaded.

    Since kdump is usually loaded at the very early stage after boot, we can
    trust the correctness of the vmcoreinfo data copied.

    On the other hand, we still need to operate the vmcoreinfo safe copy
    when crash happens to generate vmcoreinfo_note again, we rely on vmap()
    to map out a new kernel virtual address and update to use this new one
    instead in the following crash_save_vmcoreinfo().

    BTW, we do not touch vmcoreinfo_note, because it will be fully updated
    using the protected vmcoreinfo_data after crash which is surely correct
    just like the cpu crash note.

    Link: http://lkml.kernel.org/r/1493281021-20737-3-git-send-email-xlpang@redhat.com
    Signed-off-by: Xunlei Pang
    Tested-by: Michael Holzheu
    Cc: Benjamin Herrenschmidt
    Cc: Dave Young
    Cc: Eric Biederman
    Cc: Hari Bathini
    Cc: Juergen Gross
    Cc: Mahesh Salgaonkar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xunlei Pang
     

30 Jun, 2017

1 commit

  • In preparation for an objtool rewrite which will have broader checks,
    whitelist functions and files which cause problems because they do
    unusual things with the stack.

    These whitelists serve as a TODO list for which functions and files
    don't yet have undwarf unwinder coverage. Eventually most of the
    whitelists can be removed in favor of manual CFI hint annotations or
    objtool improvements.

    Signed-off-by: Josh Poimboeuf
    Cc: Andy Lutomirski
    Cc: Jiri Slaby
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: live-patching@vger.kernel.org
    Link: http://lkml.kernel.org/r/7f934a5d707a574bda33ea282e9478e627fb1829.1498659915.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar

    Josh Poimboeuf
     

09 May, 2017

2 commits

  • Get rid of multiple definitions of append_elf_note() & final_note()
    functions. Reuse these functions compiled under CONFIG_CRASH_CORE Also,
    define Elf_Word and use it instead of generic u32 or the more specific
    Elf64_Word.

    Link: http://lkml.kernel.org/r/149035342324.6881.11667840929850361402.stgit@hbathini.in.ibm.com
    Signed-off-by: Hari Bathini
    Acked-by: Dave Young
    Acked-by: Tony Luck
    Cc: Fenghua Yu
    Cc: Eric Biederman
    Cc: Mahesh Salgaonkar
    Cc: Vivek Goyal
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hari Bathini
     
  • Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and
    reuse crashkernel parameter for fadump", v4.

    Traditionally, kdump is used to save vmcore in case of a crash. Some
    architectures like powerpc can save vmcore using architecture specific
    support instead of kexec/kdump mechanism. Such architecture specific
    support also needs to reserve memory, to be used by dump capture kernel.
    crashkernel parameter can be a reused, for memory reservation, by such
    architecture specific infrastructure.

    This patchset removes dependency with CONFIG_KEXEC for crashkernel
    parameter and vmcoreinfo related code as it can be reused without kexec
    support. Also, crashkernel parameter is reused instead of
    fadump_reserve_mem to reserve memory for fadump.

    The first patch moves crashkernel parameter parsing and vmcoreinfo
    related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE. The
    second patch reuses the definitions of append_elf_note() & final_note()
    functions under CONFIG_CRASH_CORE in IA64 arch code. The third patch
    removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump)
    in powerpc. The next patch reuses crashkernel parameter for reserving
    memory for fadump, instead of the fadump_reserve_mem parameter. This
    has the advantage of using all syntaxes crashkernel parameter supports,
    for fadump as well. The last patch updates fadump kernel documentation
    about use of crashkernel parameter.

    This patch (of 5):

    Traditionally, kdump is used to save vmcore in case of a crash. Some
    architectures like powerpc can save vmcore using architecture specific
    support instead of kexec/kdump mechanism. Such architecture specific
    support also needs to reserve memory, to be used by dump capture kernel.
    crashkernel parameter can be a reused, for memory reservation, by such
    architecture specific infrastructure.

    But currently, code related to vmcoreinfo and parsing of crashkernel
    parameter is built under CONFIG_KEXEC_CORE. This patch introduces
    CONFIG_CRASH_CORE and moves the above mentioned code under this config,
    allowing code reuse without dependency on CONFIG_KEXEC. There is no
    functional change with this patch.

    Link: http://lkml.kernel.org/r/149035338104.6881.4550894432615189948.stgit@hbathini.in.ibm.com
    Signed-off-by: Hari Bathini
    Acked-by: Dave Young
    Cc: Fenghua Yu
    Cc: Tony Luck
    Cc: Eric Biederman
    Cc: Mahesh Salgaonkar
    Cc: Vivek Goyal
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hari Bathini
     

23 Feb, 2017

1 commit

  • Pull printk updates from Petr Mladek:

    - Add Petr Mladek, Sergey Senozhatsky as printk maintainers, and Steven
    Rostedt as the printk reviewer. This idea came up after the
    discussion about printk issues at Kernel Summit. It was formulated
    and discussed at lkml[1].

    - Extend a lock-less NMI per-cpu buffers idea to handle recursive
    printk() calls by Sergey Senozhatsky[2]. It is the first step in
    sanitizing printk as discussed at Kernel Summit.

    The change allows to see messages that would normally get ignored or
    would cause a deadlock.

    Also it allows to enable lockdep in printk(). This already paid off.
    The testing in linux-next helped to discover two old problems that
    were hidden before[3][4].

    - Remove unused parameter by Sergey Senozhatsky. Clean up after a past
    change.

    [1] http://lkml.kernel.org/r/1481798878-31898-1-git-send-email-pmladek@suse.com
    [2] http://lkml.kernel.org/r/20161227141611.940-1-sergey.senozhatsky@gmail.com
    [3] http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
    [4] http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
    printk: drop call_console_drivers() unused param
    printk: convert the rest to printk-safe
    printk: remove zap_locks() function
    printk: use printk_safe buffers in printk
    printk: report lost messages in printk safe/nmi contexts
    printk: always use deferred printk when flush printk_safe lines
    printk: introduce per-cpu safe_print seq buffer
    printk: rename nmi.c and exported api
    printk: use vprintk_func in vprintk()
    MAINTAINERS: Add printk maintainers

    Linus Torvalds
     

08 Feb, 2017

1 commit

  • A preparation patch for printk_safe work. No functional change.
    - rename nmi.c to print_safe.c
    - add `printk_safe' prefix to some (which used both by printk-safe
    and printk-nmi) of the exported functions.

    Link: http://lkml.kernel.org/r/20161227141611.940-3-sergey.senozhatsky@gmail.com
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Jan Kara
    Cc: Tejun Heo
    Cc: Calvin Owens
    Cc: Steven Rostedt
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Andy Lutomirski
    Cc: Peter Hurley
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek

    Sergey Senozhatsky
     

11 Jan, 2017

1 commit

  • __pa_symbol is the correct api to get the physical address of kernel
    symbols. Switch to it to allow for better debug checking.

    Reviewed-by: Mark Rutland
    Tested-by: Mark Rutland
    Acked-by: "Eric W. Biederman"
    Signed-off-by: Laura Abbott
    Signed-off-by: Will Deacon

    Laura Abbott
     

15 Dec, 2016

2 commits

  • A soft lookup will occur when I run trinity in syscall kexec_load. the
    corresponding stack information is as follows.

    BUG: soft lockup - CPU#6 stuck for 22s! [trinity-c6:13859]
    Kernel panic - not syncing: softlockup: hung tasks
    CPU: 6 PID: 13859 Comm: trinity-c6 Tainted: G O L ----V------- 3.10.0-327.28.3.35.zhongjiang.x86_64 #1
    Hardware name: Huawei Technologies Co., Ltd. Tecal BH622 V2/BC01SRSA0, BIOS RMIBV386 06/30/2014
    Call Trace:
    dump_stack+0x19/0x1b
    panic+0xd8/0x214
    watchdog_timer_fn+0x1cc/0x1e0
    __hrtimer_run_queues+0xd2/0x260
    hrtimer_interrupt+0xb0/0x1e0
    ? call_softirq+0x1c/0x30
    local_apic_timer_interrupt+0x37/0x60
    smp_apic_timer_interrupt+0x3f/0x60
    apic_timer_interrupt+0x6d/0x80
    ? kimage_alloc_control_pages+0x80/0x270
    ? kmem_cache_alloc_trace+0x1ce/0x1f0
    ? do_kimage_alloc_init+0x1f/0x90
    kimage_alloc_init+0x12a/0x180
    SyS_kexec_load+0x20a/0x260
    system_call_fastpath+0x16/0x1b

    the first time allocation of control pages may take too much time
    because crash_res.end can be set to a higher value. we need to add
    cond_resched to avoid the issue.

    The patch have been tested and above issue is not appear.

    Link: http://lkml.kernel.org/r/1481164674-42775-1-git-send-email-zhongjiang@huawei.com
    Signed-off-by: zhong jiang
    Acked-by: "Eric W. Biederman"
    Cc: Xunlei Pang
    Cc: Dave Young
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    zhong jiang
     
  • Currently in x86_64, the symbol address of phys_base is exported to
    vmcoreinfo. Dave Anderson complained this is really useless for his
    Crash implementation. Because in user-space utility Crash and
    Makedumpfile which exported vmcore information is mainly used for, value
    of phys_base is needed to covert virtual address of exported kernel
    symbol to physical address. Especially init_level4_pgt, if we want to
    access and go over the page table to look up a PA corresponding to VA,
    firstly we need calculate

    page_dir = SYMBOL(init_level4_pgt) - __START_KERNEL_map + phys_base;

    Now in Crash and Makedumpfile, we have to analyze the vmcore elf program
    header to get value of phys_base. As Dave said, it would be preferable
    if it were readily availabl in vmcoreinfo rather than depending upon the
    PT_LOAD semantics.

    Hence in this patch change to export the value of phys_base instead of
    its virtual address.

    And people also complained that KERNEL_IMAGE_SIZE exporting is x86_64
    only, should be moved into arch dependent function
    arch_crash_save_vmcoreinfo. Do the moving in this patch.

    Link: http://lkml.kernel.org/r/1478568596-30060-2-git-send-email-bhe@redhat.com
    Signed-off-by: Baoquan He
    Cc: Thomas Garnier
    Cc: Baoquan He
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H . Peter Anvin"
    Cc: Eric Biederman
    Cc: Xunlei Pang
    Cc: HATAYAMA Daisuke
    Cc: Kees Cook
    Cc: Eugene Surovegin
    Cc: Dave Young
    Cc: AKASHI Takahiro
    Cc: Atsushi Kumagai
    Cc: Dave Anderson
    Cc: Pratyush Anand
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baoquan He
     

03 Aug, 2016

6 commits

  • I hit the following issue when run trinity in my system. The kernel is
    3.4 version, but mainline has the same issue.

    The root cause is that the segment size is too large so the kerenl
    spends too long trying to allocate a page. Other cases will block until
    the test case quits. Also, OOM conditions will occur.

    Call Trace:
    __alloc_pages_nodemask+0x14c/0x8f0
    alloc_pages_current+0xaf/0x120
    kimage_alloc_pages+0x10/0x60
    kimage_alloc_control_pages+0x5d/0x270
    machine_kexec_prepare+0xe5/0x6c0
    ? kimage_free_page_list+0x52/0x70
    sys_kexec_load+0x141/0x600
    ? vfs_write+0x100/0x180
    system_call_fastpath+0x16/0x1b

    The patch changes sanity_check_segment_list() to verify that the usage by
    all segments does not exceed half of memory.

    [akpm@linux-foundation.org: fix for kexec-return-error-number-directly.patch, update comment]
    Link: http://lkml.kernel.org/r/1469625474-53904-1-git-send-email-zhongjiang@huawei.com
    Signed-off-by: zhong jiang
    Suggested-by: Eric W. Biederman
    Cc: Vivek Goyal
    Cc: Dave Young
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    zhong jiang
     
  • Provide a wrapper function to be used by kernel code to check whether a
    crash kernel is loaded. It returns the same value that can be seen in
    /sys/kernel/kexec_crash_loaded by userspace programs.

    I'm exporting the function, because it will be used by Xen, and it is
    possible to compile Xen modules separately to enable the use of PV
    drivers with unmodified bare-metal kernels.

    Link: http://lkml.kernel.org/r/20160713121955.14969.69080.stgit@hananiah.suse.cz
    Signed-off-by: Petr Tesarik
    Cc: Juergen Gross
    Cc: Josh Triplett
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Eric Biederman
    Cc: "H. Peter Anvin"
    Cc: Boris Ostrovsky
    Cc: "Paul E. McKenney"
    Cc: Dave Young
    Cc: David Vrabel
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Tesarik
     
  • kexec physical addresses are the boot-time view of the system. For
    certain ARM systems (such as Keystone 2), the boot view of the system
    does not match the kernel's view of the system: the boot view uses a
    special alias in the lower 4GB of the physical address space.

    To cater for these kinds of setups, we need to translate between the
    boot view physical addresses and the normal kernel view physical
    addresses. This patch extracts the current transation points into
    linux/kexec.h, and allows an architecture to override the functions.

    Due to the translations required, we unfortunately end up with six
    translation functions, which are reduced down to four that the
    architecture can override.

    [akpm@linux-foundation.org: kexec.h needs asm/io.h for phys_to_virt()]
    Link: http://lkml.kernel.org/r/E1b8koP-0004HZ-Vf@rmk-PC.armlinux.org.uk
    Signed-off-by: Russell King
    Cc: Keerthy
    Cc: Pratyush Anand
    Cc: Vitaly Andrianov
    Cc: Eric Biederman
    Cc: Dave Young
    Cc: Baoquan He
    Cc: Vivek Goyal
    Cc: Simon Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Russell King
     
  • On PAE systems (eg, ARM LPAE) the vmcore note may be located above 4GB
    physical on 32-bit architectures, so we need a wider type than "unsigned
    long" here. Arrange for paddr_vmcoreinfo_note() to return a
    phys_addr_t, thereby allowing it to be located above 4GB.

    This makes no difference for kexec-tools, as they already assume a
    64-bit type when reading from this file.

    Link: http://lkml.kernel.org/r/E1b8koK-0004HS-K9@rmk-PC.armlinux.org.uk
    Signed-off-by: Russell King
    Reviewed-by: Pratyush Anand
    Acked-by: Baoquan He
    Cc: Keerthy
    Cc: Vitaly Andrianov
    Cc: Eric Biederman
    Cc: Dave Young
    Cc: Vivek Goyal
    Cc: Simon Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Russell King
     
  • Ensure that user memory sizes do not wrap around when validating the
    user input, which can lead to the following input validation working
    incorrectly.

    [akpm@linux-foundation.org: fix it for kexec-return-error-number-directly.patch]
    Link: http://lkml.kernel.org/r/E1b8koF-0004HM-5x@rmk-PC.armlinux.org.uk
    Signed-off-by: Russell King
    Reviewed-by: Pratyush Anand
    Acked-by: Baoquan He
    Cc: Keerthy
    Cc: Vitaly Andrianov
    Cc: Eric Biederman
    Cc: Dave Young
    Cc: Vivek Goyal
    Cc: Simon Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Russell King
     
  • This is a cleanup patch to make kexec more clear to return error number
    directly. The variable result is useless, because there is no other
    function's return value assignes to it. So remove it.

    Link: http://lkml.kernel.org/r/1464179273-57668-1-git-send-email-mnghuan@gmail.com
    Signed-off-by: Minfei Huang
    Cc: Dave Young
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Xunlei Pang
    Cc: Atsushi Kumagai
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minfei Huang
     

24 May, 2016

2 commits

  • …unprotect)_crashkres()

    Commit 3f625002581b ("kexec: introduce a protection mechanism for the
    crashkernel reserved memory") is a similar mechanism for protecting the
    crash kernel reserved memory to previous crash_map/unmap_reserved_pages()
    implementation, the new one is more generic in name and cleaner in code
    (besides, some arch may not be allowed to unmap the pgtable).

    Therefore, this patch consolidates them, and uses the new
    arch_kexec_protect(unprotect)_crashkres() to replace former
    crash_map/unmap_reserved_pages() which by now has been only used by
    S390.

    The consolidation work needs the crash memory to be mapped initially,
    this is done in machine_kdump_pm_init() which is after
    reserve_crashkernel(). Once kdump kernel is loaded, the new
    arch_kexec_protect_crashkres() implemented for S390 will actually
    unmap the pgtable like before.

    Signed-off-by: Xunlei Pang <xlpang@redhat.com>
    Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
    Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: Minfei Huang <mhuang@redhat.com>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: Baoquan He <bhe@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Xunlei Pang
     
  • For the cases that some kernel (module) path stamps the crash reserved
    memory(already mapped by the kernel) where has been loaded the second
    kernel data, the kdump kernel will probably fail to boot when panic
    happens (or even not happens) leaving the culprit at large, this is
    unacceptable.

    The patch introduces a mechanism for detecting such cases:

    1) After each crash kexec loading, it simply marks the reserved memory
    regions readonly since we no longer access it after that. When someone
    stamps the region, the first kernel will panic and trigger the kdump.
    The weak arch_kexec_protect_crashkres() is introduced to do the actual
    protection.

    2) To allow multiple loading, once 1) was done we also need to remark
    the reserved memory to readwrite each time a system call related to
    kdump is made. The weak arch_kexec_unprotect_crashkres() is introduced
    to do the actual protection.

    The architecture can make its specific implementation by overriding
    arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres().

    Signed-off-by: Xunlei Pang
    Cc: Eric Biederman
    Cc: Dave Young
    Cc: Minfei Huang
    Cc: Vivek Goyal
    Cc: Baoquan He
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xunlei Pang
     

21 May, 2016

1 commit

  • In NMI context, printk() messages are stored into per-CPU buffers to
    avoid a possible deadlock. They are normally flushed to the main ring
    buffer via an IRQ work. But the work is never called when the system
    calls panic() in the very same NMI handler.

    This patch tries to flush NMI buffers before the crash dump is
    generated. In this case it does not risk a double release and bails out
    when the logbuf_lock is already taken. The aim is to get the messages
    into the main ring buffer when possible. It makes them better
    accessible in the vmcore.

    Then the patch tries to flush the buffers second time when other CPUs
    are down. It might be more aggressive and reset logbuf_lock. The aim
    is to get the messages available for the consequent kmsg_dump() and
    console_flush_on_panic() calls.

    The patch causes vprintk_emit() to be called even in NMI context again.
    But it is done via printk_deferred() so that the console handling is
    skipped. Consoles use internal locks and we could not prevent a
    deadlock easily. They are explicitly called later when the crash dump
    is not generated, see console_flush_on_panic().

    Signed-off-by: Petr Mladek
    Cc: Benjamin Herrenschmidt
    Cc: Daniel Thompson
    Cc: David Miller
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jiri Kosina
    Cc: Martin Schwidefsky
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Russell King
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     

20 May, 2016

1 commit

  • Many developers already know that field for reference count of the
    struct page is _count and atomic type. They would try to handle it
    directly and this could break the purpose of page reference count
    tracepoint. To prevent direct _count modification, this patch rename it
    to _refcount and add warning message on the code. After that, developer
    who need to handle reference count will find that field should not be
    accessed directly.

    [akpm@linux-foundation.org: fix comments, per Vlastimil]
    [akpm@linux-foundation.org: Documentation/vm/transhuge.txt too]
    [sfr@canb.auug.org.au: sync ethernet driver changes]
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Stephen Rothwell
    Cc: Vlastimil Babka
    Cc: Hugh Dickins
    Cc: Johannes Berg
    Cc: "David S. Miller"
    Cc: Sunil Goutham
    Cc: Chris Metcalf
    Cc: Manish Chopra
    Cc: Yuval Mintz
    Cc: Tariq Toukan
    Cc: Saeed Mahameed
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

29 Apr, 2016

2 commits

  • PageAnon() always look at head page to check PAGE_MAPPING_ANON and tail
    page's page->mapping has just a poisoned data since commit 1c290f642101
    ("mm: sanitize page->mapping for tail pages").

    If makedumpfile checks page->mapping of a compound tail page to
    distinguish anonymous page as usual, it must fail in newer kernel. So
    it's necessary to export OFFSET(page.compound_head) to avoid checking
    compound tail pages.

    The problem is that unnecessary hugepages won't be removed from a dump
    file in kernels 4.5.x and later. This means that extra disk space would
    be consumed. It's a problem, but not critical.

    Signed-off-by: Atsushi Kumagai
    Acked-by: Dave Young
    Cc: "Eric W. Biederman"
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Atsushi Kumagai
     
  • makedumpfile refers page.lru.next to get the order of compound pages for
    page filtering.

    However, now the order is stored in page.compound_order, hence
    VMCOREINFO should be updated to export the offset of
    page.compound_order.

    The fact is, page.compound_order was introduced already in kernel 4.0,
    but the offset of it was the same as page.lru.next until kernel 4.3, so
    this was not actual problem.

    The above can be said also for page.lru.prev and page.compound_dtor,
    it's necessary to detect hugetlbfs pages. Further, the content was
    changed from direct address to the ID which means dtor.

    The problem is that unnecessary hugepages won't be removed from a dump
    file in kernels 4.4.x and later. This means that extra disk space would
    be consumed. It's a problem, but not critical.

    Signed-off-by: Atsushi Kumagai
    Acked-by: Dave Young
    Cc: "Eric W. Biederman"
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Atsushi Kumagai
     

30 Jan, 2016

1 commit

  • Set proper ioresource flags and types for crash kernel
    reservation areas.

    Signed-off-by: Toshi Kani
    Signed-off-by: Borislav Petkov
    Reviewed-by: Dave Young
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: HATAYAMA Daisuke
    Cc: Linus Torvalds
    Cc: Luis R. Rodriguez
    Cc: Minfei Huang
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Toshi Kani
    Cc: Vivek Goyal
    Cc: kexec@lists.infradead.org
    Cc: linux-arch@vger.kernel.org
    Cc: linux-mm
    Link: http://lkml.kernel.org/r/1453841853-11383-8-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Toshi Kani
     

21 Jan, 2016

1 commit


19 Dec, 2015

1 commit

  • Currently, panic() and crash_kexec() can be called at the same time.
    For example (x86 case):

    CPU 0:
    oops_end()
    crash_kexec()
    mutex_trylock() // acquired
    nmi_shootdown_cpus() // stop other CPUs

    CPU 1:
    panic()
    crash_kexec()
    mutex_trylock() // failed to acquire
    smp_send_stop() // stop other CPUs
    infinite loop

    If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump
    fails.

    In another case:

    CPU 0:
    oops_end()
    crash_kexec()
    mutex_trylock() // acquired

    io_check_error()
    panic()
    crash_kexec()
    mutex_trylock() // failed to acquire
    infinite loop

    Clearly, this is an undesirable result.

    To fix this problem, this patch changes crash_kexec() to exclude others
    by using the panic_cpu atomic.

    Signed-off-by: Hidehiro Kawai
    Acked-by: Michal Hocko
    Cc: Andrew Morton
    Cc: Baoquan He
    Cc: Dave Young
    Cc: "Eric W. Biederman"
    Cc: HATAYAMA Daisuke
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Jonathan Corbet
    Cc: kexec@lists.infradead.org
    Cc: linux-doc@vger.kernel.org
    Cc: Martin Schwidefsky
    Cc: Masami Hiramatsu
    Cc: Minfei Huang
    Cc: Peter Zijlstra
    Cc: Seth Jennings
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Cc: Vitaly Kuznetsov
    Cc: Vivek Goyal
    Cc: x86-ml
    Link: http://lkml.kernel.org/r/20151210014630.25437.94161.stgit@softrs
    Signed-off-by: Borislav Petkov
    Signed-off-by: Thomas Gleixner

    Hidehiro Kawai
     

07 Nov, 2015

1 commit

  • kexec output message misses the prefix "kexec", when Dave Young split the
    kexec code. Now, we use file name as the output message prefix.

    Currently, the format of output message:
    [ 140.290795] SYSC_kexec_load: hello, world
    [ 140.291534] kexec: sanity_check_segment_list: hello, world

    Ideally, the format of output message:
    [ 30.791503] kexec: SYSC_kexec_load, Hello, world
    [ 79.182752] kexec_core: sanity_check_segment_list, Hello, world

    Remove the custom prefix "kexec" in output message.

    Signed-off-by: Minfei Huang
    Acked-by: Dave Young
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minfei Huang
     

21 Oct, 2015

1 commit

  • It is helpful when the crashkernel cmdline parsing routines
    actually say which character is the unrecognized one. Make them
    do so.

    Signed-off-by: Borislav Petkov
    Reviewed-by: Dave Young
    Reviewed-by: Joerg Roedel
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Baoquan He
    Cc: H. Peter Anvin
    Cc: Jiri Kosina
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Mark Salter
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Vivek Goyal
    Cc: WANG Chao
    Cc: jerry_hoemann@hp.com
    Link: http://lkml.kernel.org/r/1445246268-26285-8-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     

11 Sep, 2015

2 commits

  • In x86_64, since v2.6.26 the KERNEL_IMAGE_SIZE is changed to 512M, and
    accordingly the MODULES_VADDR is changed to 0xffffffffa0000000. However,
    in v3.12 Kees Cook introduced kaslr to randomise the location of kernel.
    And the kernel text mapping addr space is enlarged from 512M to 1G. That
    means now KERNEL_IMAGE_SIZE is variable, its value is 512M when kaslr
    support is not compiled in and 1G when kaslr support is compiled in.
    Accordingly the MODULES_VADDR is changed too to be:

    #define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE)

    So when kaslr is compiled in and enabled, the kernel text mapping addr
    space and modules vaddr space need be adjusted. Otherwise makedumpfile
    will collapse since the addr for some symbols is not correct.

    Hence KERNEL_IMAGE_SIZE need be exported to vmcoreinfo and got in
    makedumpfile to help calculate MODULES_VADDR.

    Signed-off-by: Baoquan He
    Acked-by: Kees Cook
    Acked-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baoquan He
     
  • People reported that crash_notes in /proc/vmcore were corrupted and this
    cause crash kdump failure. With code debugging and log we got the root
    cause. This is because percpu variable crash_notes are allocated in 2
    vmalloc pages. Currently percpu is based on vmalloc by default. Vmalloc
    can't guarantee 2 continuous vmalloc pages are also on 2 continuous
    physical pages. So when 1st kernel exports the starting address and size
    of crash_notes through sysfs like below:

    /sys/devices/system/cpu/cpux/crash_notes
    /sys/devices/system/cpu/cpux/crash_notes_size

    kdump kernel use them to get the content of crash_notes. However the 2nd
    part may not be in the next neighbouring physical page as we expected if
    crash_notes are allocated accross 2 vmalloc pages. That's why
    nhdr_ptr->n_namesz or nhdr_ptr->n_descsz could be very huge in
    update_note_header_size_elf64() and cause note header merging failure or
    some warnings.

    In this patch change to call __alloc_percpu() to passed in the align value
    by rounding crash_notes_size up to the nearest power of two. This makes
    sure the crash_notes is allocated inside one physical page since
    sizeof(note_buf_t) in all ARCHS is smaller than PAGE_SIZE. Meanwhile add
    a BUILD_BUG_ON to break compile if size is bigger than PAGE_SIZE since
    crash_notes definitely will be in 2 pages. That need be avoided, and need
    be reported if it's unavoidable.

    [akpm@linux-foundation.org: use correct comment layout]
    Signed-off-by: Baoquan He
    Cc: Eric W. Biederman
    Cc: Vivek Goyal
    Cc: Dave Young
    Cc: Lisa Mitchell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baoquan He