02 Jul, 2021

1 commit

  • kernel.h is being used as a dump for all kinds of stuff for a long time.
    Here is the attempt to start cleaning it up by splitting out panic and
    oops helpers.

    There are several purposes of doing this:
    - dropping dependency in bug.h
    - dropping a loop by moving out panic_notifier.h
    - unload kernel.h from something which has its own domain

    At the same time convert users tree-wide to use new headers, although for
    the time being include new header back to kernel.h to avoid twisted
    indirected includes for existing users.

    [akpm@linux-foundation.org: thread_info.h needs limits.h]
    [andriy.shevchenko@linux.intel.com: ia64 fix]
    Link: https://lkml.kernel.org/r/20210520130557.55277-1-andriy.shevchenko@linux.intel.com

    Link: https://lkml.kernel.org/r/20210511074137.33666-1-andriy.shevchenko@linux.intel.com
    Signed-off-by: Andy Shevchenko
    Reviewed-by: Bjorn Andersson
    Co-developed-by: Andrew Morton
    Acked-by: Mike Rapoport
    Acked-by: Corey Minyard
    Acked-by: Christian Brauner
    Acked-by: Arnd Bergmann
    Acked-by: Kees Cook
    Acked-by: Wei Liu
    Acked-by: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Acked-by: Sebastian Reichel
    Acked-by: Luis Chamberlain
    Acked-by: Stephen Boyd
    Acked-by: Thomas Bogendoerfer
    Acked-by: Helge Deller # parisc
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     

07 May, 2021

2 commits

  • kmsg_dump(KMSG_DUMP_SHUTDOWN) is called before machine_restart(),
    machine_halt(), and machine_power_off(). The only one that is missing
    is machine_kexec().

    The dmesg output that it contains can be used to study the shutdown
    performance of both kernel and systemd during kexec reboot.

    Here is example of dmesg data collected after kexec:

    root@dplat-cp22:~# cat /sys/fs/pstore/dmesg-ramoops-0 | tail
    ...
    [ 70.914592] psci: CPU3 killed (polled 0 ms)
    [ 70.915705] CPU4: shutdown
    [ 70.916643] psci: CPU4 killed (polled 4 ms)
    [ 70.917715] CPU5: shutdown
    [ 70.918725] psci: CPU5 killed (polled 0 ms)
    [ 70.919704] CPU6: shutdown
    [ 70.920726] psci: CPU6 killed (polled 4 ms)
    [ 70.921642] CPU7: shutdown
    [ 70.922650] psci: CPU7 killed (polled 0 ms)

    Link: https://lkml.kernel.org/r/20210319192326.146000-2-pasha.tatashin@soleen.com
    Signed-off-by: Pavel Tatashin
    Reviewed-by: Kees Cook
    Reviewed-by: Petr Mladek
    Reviewed-by: Bhupesh Sharma
    Acked-by: Baoquan He
    Reviewed-by: Tyler Hicks
    Cc: James Morris
    Cc: Sasha Levin
    Cc: Eric W. Biederman
    Cc: Anton Vorontsov
    Cc: Colin Cross
    Cc: Tony Luck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Tatashin
     
  • The purpose is to notify the kernel module for fast reboot.

    Upstream a patch from the SONiC network operating system [1].

    [1]: https://github.com/Azure/sonic-linux-kernel/pull/46

    Link: https://lkml.kernel.org/r/20210304124626.13927-1-pmenzel@molgen.mpg.de
    Signed-off-by: Joe LeVeque
    Signed-off-by: Paul Menzel
    Acked-by: Baoquan He
    Cc: Guohan Lu
    Cc: Joe LeVeque
    Cc: Paul Menzel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe LeVeque
     

22 Feb, 2021

1 commit

  • Pull ELF compat updates from Al Viro:
    "Sanitizing ELF compat support, especially for triarch architectures:

    - X32 handling cleaned up

    - MIPS64 uses compat_binfmt_elf.c both for O32 and N32 now

    - Kconfig side of things regularized

    Eventually I hope to have compat_binfmt_elf.c killed, with both native
    and compat built from fs/binfmt_elf.c, with -DELF_BITS={64,32} passed
    by kbuild, but that's a separate story - not included here"

    * 'work.elf-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    get rid of COMPAT_ELF_EXEC_PAGESIZE
    compat_binfmt_elf: don't bother with undef of ELF_ARCH
    Kconfig: regularize selection of CONFIG_BINFMT_ELF
    mips compat: switch to compat_binfmt_elf.c
    mips: don't bother with ELF_CORE_EFLAGS
    mips compat: don't bother with ELF_ET_DYN_BASE
    mips: KVM_GUEST makes no sense for 64bit builds...
    mips: kill unused definitions in binfmt_elf[on]32.c
    mips binfmt_elf*32.c: use elfcore-compat.h
    x32: make X32, !IA32_EMULATION setups able to execute x32 binaries
    [amd64] clean PRSTATUS_SIZE/SET_PR_FPVALID up properly
    elf_prstatus: collect the common part (everything before pr_reg) into a struct
    binfmt_elf: partially sanitize PRSTATUS_SIZE and SET_PR_FPVALID

    Linus Torvalds
     

26 Jan, 2021

1 commit

  • Function kernel_kexec() is called with lock system_transition_mutex
    held in reboot system call. While inside kernel_kexec(), it will
    acquire system_transition_mutex agin. This will lead to dead lock.

    The dead lock should be easily triggered, it hasn't caused any
    failure report just because the feature 'kexec jump' is almost not
    used by anyone as far as I know. An inquiry can be made about who
    is using 'kexec jump' and where it's used. Before that, let's simply
    remove the lock operation inside CONFIG_KEXEC_JUMP ifdeffery scope.

    Fixes: 55f2503c3b69 ("PM / reboot: Eliminate race between reboot and suspend")
    Signed-off-by: Baoquan He
    Reported-by: Dan Carpenter
    Reviewed-by: Pingfan Liu
    Cc: 4.19+ # 4.19+
    Signed-off-by: Rafael J. Wysocki

    Baoquan He
     

06 Jan, 2021

1 commit

  • Preparations to doing i386 compat elf_prstatus sanely - rather than duplicating
    the beginning of compat_elf_prstatus, take these fields into a separate
    structure (compat_elf_prstatus_common), so that it could be reused. Due to
    the incestous relationship between binfmt_elf.c and compat_binfmt_elf.c we
    need the same shape change done to native struct elf_prstatus, gathering the
    fields prior to pr_reg into a new structure (struct elf_prstatus_common).

    Fortunately, offset of pr_reg is always a multiple of 16 with no padding
    right before it, so it's possible to turn all the stuff prior to it into
    a single member without disturbing the layout.

    [build fix from Geert Uytterhoeven folded in]

    Signed-off-by: Al Viro

    Al Viro
     

20 Nov, 2020

1 commit

  • Currently contains declarations for both SHA-1 and SHA-2,
    and contains declarations for SHA-3.

    This organization is inconsistent, but more importantly SHA-1 is no
    longer considered to be cryptographically secure. So to the extent
    possible, SHA-1 shouldn't be grouped together with any of the other SHA
    versions, and usage of it should be phased out.

    Therefore, split into two headers and
    , and make everyone explicitly specify whether they want
    the declarations for SHA-1, SHA-2, or both.

    This avoids making the SHA-1 declarations visible to files that don't
    want anything to do with SHA-1. It also prepares for potentially moving
    sha1.h into a new insecure/ or dangerous/ directory.

    Signed-off-by: Eric Biggers
    Acked-by: Ard Biesheuvel
    Acked-by: Jason A. Donenfeld
    Signed-off-by: Herbert Xu

    Eric Biggers
     

17 Oct, 2020

1 commit

  • Fix multiple occurrences of duplicated words in kernel/.

    Fix one typo/spello on the same line as a duplicate word. Change one
    instance of "the the" to "that the". Otherwise just drop one of the
    repeated words.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Link: https://lkml.kernel.org/r/98202fa6-8919-ef63-9efe-c0fad5ca7af1@infradead.org
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

10 Sep, 2020

1 commit


09 Jan, 2020

2 commits

  • It is the same as machine_kexec_prepare(), but is called after segments are
    loaded. This way, can do processing work with already loaded relocation
    segments. One such example is arm64: it has to have segments loaded in
    order to create a page table, but it cannot do it during kexec time,
    because at that time allocations won't be possible anymore.

    Signed-off-by: Pavel Tatashin
    Acked-by: Dave Young
    Signed-off-by: Will Deacon

    Pavel Tatashin
     
  • Here is a regular kexec command sequence and output:
    =====
    $ kexec --reuse-cmdline -i --load Image
    $ kexec -e
    [ 161.342002] kexec_core: Starting new kernel

    Welcome to Buildroot
    buildroot login:
    =====

    Even when "quiet" kernel parameter is specified, "kexec_core: Starting
    new kernel" is printed.

    This message has KERN_EMERG level, but there is no emergency, it is a
    normal kexec operation, so quiet it down to appropriate KERN_NOTICE.

    Machines that have slow console baud rate benefit from less output.

    Signed-off-by: Pavel Tatashin
    Reviewed-by: Simon Horman
    Acked-by: Dave Young
    Signed-off-by: Will Deacon

    Pavel Tatashin
     

26 Sep, 2019

1 commit

  • syzbot found that a thread can stall for minutes inside kexec_load() after
    that thread was killed by SIGKILL [1]. It turned out that the reproducer
    was trying to allocate 2408MB of memory using kimage_alloc_page() from
    kimage_load_normal_segment(). Let's check for SIGKILL before doing memory
    allocation.

    [1] https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f5e

    Link: http://lkml.kernel.org/r/993c9185-d324-2640-d061-bed2dd18b1f7@I-love.SAKURA.ne.jp
    Signed-off-by: Tetsuo Handa
    Reported-by: syzbot
    Cc: Eric Biederman
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     

19 Jun, 2019

1 commit

  • Based on 2 normalized pattern(s):

    this source code is licensed under the gnu general public license
    version 2 see the file copying for more details

    this source code is licensed under general public license version 2
    see

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 52 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Enrico Weigelt
    Reviewed-by: Allison Randal
    Reviewed-by: Alexios Zavras
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190602204653.449021192@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

04 May, 2019

1 commit

  • This adds a function to disable secondary CPUs for suspend that are
    not necessarily non-zero / non-boot CPUs. Platforms will be able to
    use this to suspend using non-zero CPUs.

    Signed-off-by: Nicholas Piggin
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rafael J . Wysocki
    Cc: Thomas Gleixner
    Cc: linuxppc-dev@lists.ozlabs.org
    Link: https://lkml.kernel.org/r/20190411033448.20842-3-npiggin@gmail.com
    Signed-off-by: Ingo Molnar

    Nicholas Piggin
     

29 Dec, 2018

2 commits

  • totalram_pages and totalhigh_pages are made static inline function.

    Main motivation was that managed_page_count_lock handling was complicating
    things. It was discussed in length here,
    https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
    better to remove the lock and convert variables to atomic, with preventing
    poteintial store-to-read tearing as a bonus.

    [akpm@linux-foundation.org: coding style fixes]
    Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
    Signed-off-by: Arun KS
    Suggested-by: Michal Hocko
    Suggested-by: Vlastimil Babka
    Reviewed-by: Konstantin Khlebnikov
    Reviewed-by: Pavel Tatashin
    Acked-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: David Hildenbrand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun KS
     
  • Patch series "mm: convert totalram_pages, totalhigh_pages and managed
    pages to atomic", v5.

    This series converts totalram_pages, totalhigh_pages and
    zone->managed_pages to atomic variables.

    totalram_pages, zone->managed_pages and totalhigh_pages updates are
    protected by managed_page_count_lock, but readers never care about it.
    Convert these variables to atomic to avoid readers potentially seeing a
    store tear.

    Main motivation was that managed_page_count_lock handling was complicating
    things. It was discussed in length here,
    https://lore.kernel.org/patchwork/patch/995739/#1181785 It seemes better
    to remove the lock and convert variables to atomic. With the change,
    preventing poteintial store-to-read tearing comes as a bonus.

    This patch (of 4):

    This is in preparation to a later patch which converts totalram_pages and
    zone->managed_pages to atomic variables. Please note that re-reading the
    value might lead to a different value and as such it could lead to
    unexpected behavior. There are no known bugs as a result of the current
    code but it is better to prevent from them in principle.

    Link: http://lkml.kernel.org/r/1542090790-21750-2-git-send-email-arunks@codeaurora.org
    Signed-off-by: Arun KS
    Reviewed-by: Konstantin Khlebnikov
    Reviewed-by: David Hildenbrand
    Acked-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Reviewed-by: Pavel Tatashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun KS
     

06 Oct, 2018

1 commit

  • When SME is enabled in the first kernel, it needs to allocate decrypted
    pages for kdump because when the kdump kernel boots, these pages need to
    be accessed decrypted in the initial boot stage, before SME is enabled.

    [ bp: clean up text. ]

    Signed-off-by: Lianbo Jiang
    Signed-off-by: Borislav Petkov
    Reviewed-by: Tom Lendacky
    Cc: kexec@lists.infradead.org
    Cc: tglx@linutronix.de
    Cc: mingo@redhat.com
    Cc: hpa@zytor.com
    Cc: akpm@linux-foundation.org
    Cc: dan.j.williams@intel.com
    Cc: bhelgaas@google.com
    Cc: baiyaowei@cmss.chinamobile.com
    Cc: tiwai@suse.de
    Cc: brijesh.singh@amd.com
    Cc: dyoung@redhat.com
    Cc: bhe@redhat.com
    Cc: jroedel@suse.de
    Link: https://lkml.kernel.org/r/20180930031033.22110-3-lijiang@redhat.com

    Lianbo Jiang
     

15 Jun, 2018

1 commit

  • Without yielding while loading kimage segments, a large initrd will
    block all other work on the CPU performing the load until it is
    completed. For example loading an initrd of 200MB on a low power single
    core system will lock up the system for a few seconds.

    To increase system responsiveness to other tasks at that time, call
    cond_resched() in both the crash kernel and normal kernel segment
    loading loops.

    I did run into a practical problem. Hardware watchdogs on embedded
    systems can have short timers on the order of seconds. If the system is
    locked up for a few seconds with only a single core available, the
    watchdog may not be pet in a timely fashion. If this happens, the
    hardware watchdog will fire and reset the system.

    This really only becomes a problem when you are working with a single
    core, a decently sized initrd, and have a constrained hardware watchdog.

    Link: http://lkml.kernel.org/r/1528738546-3328-1-git-send-email-jmf@amazon.com
    Signed-off-by: Jarrett Farnitano
    Reviewed-by: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jarrett Farnitano
     

18 Jul, 2017

1 commit

  • Provide support so that kexec can be used to boot a kernel when SME is
    enabled.

    Support is needed to allocate pages for kexec without encryption. This
    is needed in order to be able to reboot in the kernel in the same manner
    as originally booted.

    Additionally, when shutting down all of the CPUs we need to be sure to
    flush the caches and then halt. This is needed when booting from a state
    where SME was not active into a state where SME is active (or vice-versa).
    Without these steps, it is possible for cache lines to exist for the same
    physical location but tagged both with and without the encryption bit. This
    can cause random memory corruption when caches are flushed depending on
    which cacheline is written last.

    Signed-off-by: Tom Lendacky
    Reviewed-by: Thomas Gleixner
    Reviewed-by: Borislav Petkov
    Cc:
    Cc: Alexander Potapenko
    Cc: Andrey Ryabinin
    Cc: Andy Lutomirski
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brijesh Singh
    Cc: Dave Young
    Cc: Dmitry Vyukov
    Cc: Jonathan Corbet
    Cc: Konrad Rzeszutek Wilk
    Cc: Larry Woodman
    Cc: Linus Torvalds
    Cc: Matt Fleming
    Cc: Michael S. Tsirkin
    Cc: Paolo Bonzini
    Cc: Peter Zijlstra
    Cc: Radim Krčmář
    Cc: Rik van Riel
    Cc: Toshimitsu Kani
    Cc: kasan-dev@googlegroups.com
    Cc: kvm@vger.kernel.org
    Cc: linux-arch@vger.kernel.org
    Cc: linux-doc@vger.kernel.org
    Cc: linux-efi@vger.kernel.org
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/b95ff075db3e7cd545313f2fb609a49619a09625.1500319216.git.thomas.lendacky@amd.com
    Signed-off-by: Ingo Molnar

    Tom Lendacky
     

13 Jul, 2017

1 commit

  • Currently vmcoreinfo data is updated at boot time subsys_initcall(), it
    has the risk of being modified by some wrong code during system is
    running.

    As a result, vmcore dumped may contain the wrong vmcoreinfo. Later on,
    when using "crash", "makedumpfile", etc utility to parse this vmcore, we
    probably will get "Segmentation fault" or other unexpected errors.

    E.g. 1) wrong code overwrites vmcoreinfo_data; 2) further crashes the
    system; 3) trigger kdump, then we obviously will fail to recognize the
    crash context correctly due to the corrupted vmcoreinfo.

    Now except for vmcoreinfo, all the crash data is well
    protected(including the cpu note which is fully updated in the crash
    path, thus its correctness is guaranteed). Given that vmcoreinfo data
    is a large chunk prepared for kdump, we better protect it as well.

    To solve this, we relocate and copy vmcoreinfo_data to the crash memory
    when kdump is loading via kexec syscalls. Because the whole crash
    memory will be protected by existing arch_kexec_protect_crashkres()
    mechanism, we naturally protect vmcoreinfo_data from write(even read)
    access under kernel direct mapping after kdump is loaded.

    Since kdump is usually loaded at the very early stage after boot, we can
    trust the correctness of the vmcoreinfo data copied.

    On the other hand, we still need to operate the vmcoreinfo safe copy
    when crash happens to generate vmcoreinfo_note again, we rely on vmap()
    to map out a new kernel virtual address and update to use this new one
    instead in the following crash_save_vmcoreinfo().

    BTW, we do not touch vmcoreinfo_note, because it will be fully updated
    using the protected vmcoreinfo_data after crash which is surely correct
    just like the cpu crash note.

    Link: http://lkml.kernel.org/r/1493281021-20737-3-git-send-email-xlpang@redhat.com
    Signed-off-by: Xunlei Pang
    Tested-by: Michael Holzheu
    Cc: Benjamin Herrenschmidt
    Cc: Dave Young
    Cc: Eric Biederman
    Cc: Hari Bathini
    Cc: Juergen Gross
    Cc: Mahesh Salgaonkar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xunlei Pang
     

30 Jun, 2017

1 commit

  • In preparation for an objtool rewrite which will have broader checks,
    whitelist functions and files which cause problems because they do
    unusual things with the stack.

    These whitelists serve as a TODO list for which functions and files
    don't yet have undwarf unwinder coverage. Eventually most of the
    whitelists can be removed in favor of manual CFI hint annotations or
    objtool improvements.

    Signed-off-by: Josh Poimboeuf
    Cc: Andy Lutomirski
    Cc: Jiri Slaby
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: live-patching@vger.kernel.org
    Link: http://lkml.kernel.org/r/7f934a5d707a574bda33ea282e9478e627fb1829.1498659915.git.jpoimboe@redhat.com
    Signed-off-by: Ingo Molnar

    Josh Poimboeuf
     

09 May, 2017

2 commits

  • Get rid of multiple definitions of append_elf_note() & final_note()
    functions. Reuse these functions compiled under CONFIG_CRASH_CORE Also,
    define Elf_Word and use it instead of generic u32 or the more specific
    Elf64_Word.

    Link: http://lkml.kernel.org/r/149035342324.6881.11667840929850361402.stgit@hbathini.in.ibm.com
    Signed-off-by: Hari Bathini
    Acked-by: Dave Young
    Acked-by: Tony Luck
    Cc: Fenghua Yu
    Cc: Eric Biederman
    Cc: Mahesh Salgaonkar
    Cc: Vivek Goyal
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hari Bathini
     
  • Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and
    reuse crashkernel parameter for fadump", v4.

    Traditionally, kdump is used to save vmcore in case of a crash. Some
    architectures like powerpc can save vmcore using architecture specific
    support instead of kexec/kdump mechanism. Such architecture specific
    support also needs to reserve memory, to be used by dump capture kernel.
    crashkernel parameter can be a reused, for memory reservation, by such
    architecture specific infrastructure.

    This patchset removes dependency with CONFIG_KEXEC for crashkernel
    parameter and vmcoreinfo related code as it can be reused without kexec
    support. Also, crashkernel parameter is reused instead of
    fadump_reserve_mem to reserve memory for fadump.

    The first patch moves crashkernel parameter parsing and vmcoreinfo
    related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE. The
    second patch reuses the definitions of append_elf_note() & final_note()
    functions under CONFIG_CRASH_CORE in IA64 arch code. The third patch
    removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump)
    in powerpc. The next patch reuses crashkernel parameter for reserving
    memory for fadump, instead of the fadump_reserve_mem parameter. This
    has the advantage of using all syntaxes crashkernel parameter supports,
    for fadump as well. The last patch updates fadump kernel documentation
    about use of crashkernel parameter.

    This patch (of 5):

    Traditionally, kdump is used to save vmcore in case of a crash. Some
    architectures like powerpc can save vmcore using architecture specific
    support instead of kexec/kdump mechanism. Such architecture specific
    support also needs to reserve memory, to be used by dump capture kernel.
    crashkernel parameter can be a reused, for memory reservation, by such
    architecture specific infrastructure.

    But currently, code related to vmcoreinfo and parsing of crashkernel
    parameter is built under CONFIG_KEXEC_CORE. This patch introduces
    CONFIG_CRASH_CORE and moves the above mentioned code under this config,
    allowing code reuse without dependency on CONFIG_KEXEC. There is no
    functional change with this patch.

    Link: http://lkml.kernel.org/r/149035338104.6881.4550894432615189948.stgit@hbathini.in.ibm.com
    Signed-off-by: Hari Bathini
    Acked-by: Dave Young
    Cc: Fenghua Yu
    Cc: Tony Luck
    Cc: Eric Biederman
    Cc: Mahesh Salgaonkar
    Cc: Vivek Goyal
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hari Bathini
     

23 Feb, 2017

1 commit

  • Pull printk updates from Petr Mladek:

    - Add Petr Mladek, Sergey Senozhatsky as printk maintainers, and Steven
    Rostedt as the printk reviewer. This idea came up after the
    discussion about printk issues at Kernel Summit. It was formulated
    and discussed at lkml[1].

    - Extend a lock-less NMI per-cpu buffers idea to handle recursive
    printk() calls by Sergey Senozhatsky[2]. It is the first step in
    sanitizing printk as discussed at Kernel Summit.

    The change allows to see messages that would normally get ignored or
    would cause a deadlock.

    Also it allows to enable lockdep in printk(). This already paid off.
    The testing in linux-next helped to discover two old problems that
    were hidden before[3][4].

    - Remove unused parameter by Sergey Senozhatsky. Clean up after a past
    change.

    [1] http://lkml.kernel.org/r/1481798878-31898-1-git-send-email-pmladek@suse.com
    [2] http://lkml.kernel.org/r/20161227141611.940-1-sergey.senozhatsky@gmail.com
    [3] http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
    [4] http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
    printk: drop call_console_drivers() unused param
    printk: convert the rest to printk-safe
    printk: remove zap_locks() function
    printk: use printk_safe buffers in printk
    printk: report lost messages in printk safe/nmi contexts
    printk: always use deferred printk when flush printk_safe lines
    printk: introduce per-cpu safe_print seq buffer
    printk: rename nmi.c and exported api
    printk: use vprintk_func in vprintk()
    MAINTAINERS: Add printk maintainers

    Linus Torvalds
     

08 Feb, 2017

1 commit

  • A preparation patch for printk_safe work. No functional change.
    - rename nmi.c to print_safe.c
    - add `printk_safe' prefix to some (which used both by printk-safe
    and printk-nmi) of the exported functions.

    Link: http://lkml.kernel.org/r/20161227141611.940-3-sergey.senozhatsky@gmail.com
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Jan Kara
    Cc: Tejun Heo
    Cc: Calvin Owens
    Cc: Steven Rostedt
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Andy Lutomirski
    Cc: Peter Hurley
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Sergey Senozhatsky
    Signed-off-by: Petr Mladek

    Sergey Senozhatsky
     

11 Jan, 2017

1 commit

  • __pa_symbol is the correct api to get the physical address of kernel
    symbols. Switch to it to allow for better debug checking.

    Reviewed-by: Mark Rutland
    Tested-by: Mark Rutland
    Acked-by: "Eric W. Biederman"
    Signed-off-by: Laura Abbott
    Signed-off-by: Will Deacon

    Laura Abbott
     

15 Dec, 2016

2 commits

  • A soft lookup will occur when I run trinity in syscall kexec_load. the
    corresponding stack information is as follows.

    BUG: soft lockup - CPU#6 stuck for 22s! [trinity-c6:13859]
    Kernel panic - not syncing: softlockup: hung tasks
    CPU: 6 PID: 13859 Comm: trinity-c6 Tainted: G O L ----V------- 3.10.0-327.28.3.35.zhongjiang.x86_64 #1
    Hardware name: Huawei Technologies Co., Ltd. Tecal BH622 V2/BC01SRSA0, BIOS RMIBV386 06/30/2014
    Call Trace:
    dump_stack+0x19/0x1b
    panic+0xd8/0x214
    watchdog_timer_fn+0x1cc/0x1e0
    __hrtimer_run_queues+0xd2/0x260
    hrtimer_interrupt+0xb0/0x1e0
    ? call_softirq+0x1c/0x30
    local_apic_timer_interrupt+0x37/0x60
    smp_apic_timer_interrupt+0x3f/0x60
    apic_timer_interrupt+0x6d/0x80
    ? kimage_alloc_control_pages+0x80/0x270
    ? kmem_cache_alloc_trace+0x1ce/0x1f0
    ? do_kimage_alloc_init+0x1f/0x90
    kimage_alloc_init+0x12a/0x180
    SyS_kexec_load+0x20a/0x260
    system_call_fastpath+0x16/0x1b

    the first time allocation of control pages may take too much time
    because crash_res.end can be set to a higher value. we need to add
    cond_resched to avoid the issue.

    The patch have been tested and above issue is not appear.

    Link: http://lkml.kernel.org/r/1481164674-42775-1-git-send-email-zhongjiang@huawei.com
    Signed-off-by: zhong jiang
    Acked-by: "Eric W. Biederman"
    Cc: Xunlei Pang
    Cc: Dave Young
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    zhong jiang
     
  • Currently in x86_64, the symbol address of phys_base is exported to
    vmcoreinfo. Dave Anderson complained this is really useless for his
    Crash implementation. Because in user-space utility Crash and
    Makedumpfile which exported vmcore information is mainly used for, value
    of phys_base is needed to covert virtual address of exported kernel
    symbol to physical address. Especially init_level4_pgt, if we want to
    access and go over the page table to look up a PA corresponding to VA,
    firstly we need calculate

    page_dir = SYMBOL(init_level4_pgt) - __START_KERNEL_map + phys_base;

    Now in Crash and Makedumpfile, we have to analyze the vmcore elf program
    header to get value of phys_base. As Dave said, it would be preferable
    if it were readily availabl in vmcoreinfo rather than depending upon the
    PT_LOAD semantics.

    Hence in this patch change to export the value of phys_base instead of
    its virtual address.

    And people also complained that KERNEL_IMAGE_SIZE exporting is x86_64
    only, should be moved into arch dependent function
    arch_crash_save_vmcoreinfo. Do the moving in this patch.

    Link: http://lkml.kernel.org/r/1478568596-30060-2-git-send-email-bhe@redhat.com
    Signed-off-by: Baoquan He
    Cc: Thomas Garnier
    Cc: Baoquan He
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H . Peter Anvin"
    Cc: Eric Biederman
    Cc: Xunlei Pang
    Cc: HATAYAMA Daisuke
    Cc: Kees Cook
    Cc: Eugene Surovegin
    Cc: Dave Young
    Cc: AKASHI Takahiro
    Cc: Atsushi Kumagai
    Cc: Dave Anderson
    Cc: Pratyush Anand
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baoquan He
     

03 Aug, 2016

6 commits

  • I hit the following issue when run trinity in my system. The kernel is
    3.4 version, but mainline has the same issue.

    The root cause is that the segment size is too large so the kerenl
    spends too long trying to allocate a page. Other cases will block until
    the test case quits. Also, OOM conditions will occur.

    Call Trace:
    __alloc_pages_nodemask+0x14c/0x8f0
    alloc_pages_current+0xaf/0x120
    kimage_alloc_pages+0x10/0x60
    kimage_alloc_control_pages+0x5d/0x270
    machine_kexec_prepare+0xe5/0x6c0
    ? kimage_free_page_list+0x52/0x70
    sys_kexec_load+0x141/0x600
    ? vfs_write+0x100/0x180
    system_call_fastpath+0x16/0x1b

    The patch changes sanity_check_segment_list() to verify that the usage by
    all segments does not exceed half of memory.

    [akpm@linux-foundation.org: fix for kexec-return-error-number-directly.patch, update comment]
    Link: http://lkml.kernel.org/r/1469625474-53904-1-git-send-email-zhongjiang@huawei.com
    Signed-off-by: zhong jiang
    Suggested-by: Eric W. Biederman
    Cc: Vivek Goyal
    Cc: Dave Young
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    zhong jiang
     
  • Provide a wrapper function to be used by kernel code to check whether a
    crash kernel is loaded. It returns the same value that can be seen in
    /sys/kernel/kexec_crash_loaded by userspace programs.

    I'm exporting the function, because it will be used by Xen, and it is
    possible to compile Xen modules separately to enable the use of PV
    drivers with unmodified bare-metal kernels.

    Link: http://lkml.kernel.org/r/20160713121955.14969.69080.stgit@hananiah.suse.cz
    Signed-off-by: Petr Tesarik
    Cc: Juergen Gross
    Cc: Josh Triplett
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Eric Biederman
    Cc: "H. Peter Anvin"
    Cc: Boris Ostrovsky
    Cc: "Paul E. McKenney"
    Cc: Dave Young
    Cc: David Vrabel
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Tesarik
     
  • kexec physical addresses are the boot-time view of the system. For
    certain ARM systems (such as Keystone 2), the boot view of the system
    does not match the kernel's view of the system: the boot view uses a
    special alias in the lower 4GB of the physical address space.

    To cater for these kinds of setups, we need to translate between the
    boot view physical addresses and the normal kernel view physical
    addresses. This patch extracts the current transation points into
    linux/kexec.h, and allows an architecture to override the functions.

    Due to the translations required, we unfortunately end up with six
    translation functions, which are reduced down to four that the
    architecture can override.

    [akpm@linux-foundation.org: kexec.h needs asm/io.h for phys_to_virt()]
    Link: http://lkml.kernel.org/r/E1b8koP-0004HZ-Vf@rmk-PC.armlinux.org.uk
    Signed-off-by: Russell King
    Cc: Keerthy
    Cc: Pratyush Anand
    Cc: Vitaly Andrianov
    Cc: Eric Biederman
    Cc: Dave Young
    Cc: Baoquan He
    Cc: Vivek Goyal
    Cc: Simon Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Russell King
     
  • On PAE systems (eg, ARM LPAE) the vmcore note may be located above 4GB
    physical on 32-bit architectures, so we need a wider type than "unsigned
    long" here. Arrange for paddr_vmcoreinfo_note() to return a
    phys_addr_t, thereby allowing it to be located above 4GB.

    This makes no difference for kexec-tools, as they already assume a
    64-bit type when reading from this file.

    Link: http://lkml.kernel.org/r/E1b8koK-0004HS-K9@rmk-PC.armlinux.org.uk
    Signed-off-by: Russell King
    Reviewed-by: Pratyush Anand
    Acked-by: Baoquan He
    Cc: Keerthy
    Cc: Vitaly Andrianov
    Cc: Eric Biederman
    Cc: Dave Young
    Cc: Vivek Goyal
    Cc: Simon Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Russell King
     
  • Ensure that user memory sizes do not wrap around when validating the
    user input, which can lead to the following input validation working
    incorrectly.

    [akpm@linux-foundation.org: fix it for kexec-return-error-number-directly.patch]
    Link: http://lkml.kernel.org/r/E1b8koF-0004HM-5x@rmk-PC.armlinux.org.uk
    Signed-off-by: Russell King
    Reviewed-by: Pratyush Anand
    Acked-by: Baoquan He
    Cc: Keerthy
    Cc: Vitaly Andrianov
    Cc: Eric Biederman
    Cc: Dave Young
    Cc: Vivek Goyal
    Cc: Simon Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Russell King
     
  • This is a cleanup patch to make kexec more clear to return error number
    directly. The variable result is useless, because there is no other
    function's return value assignes to it. So remove it.

    Link: http://lkml.kernel.org/r/1464179273-57668-1-git-send-email-mnghuan@gmail.com
    Signed-off-by: Minfei Huang
    Cc: Dave Young
    Cc: Baoquan He
    Cc: Borislav Petkov
    Cc: Xunlei Pang
    Cc: Atsushi Kumagai
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minfei Huang
     

24 May, 2016

2 commits

  • …unprotect)_crashkres()

    Commit 3f625002581b ("kexec: introduce a protection mechanism for the
    crashkernel reserved memory") is a similar mechanism for protecting the
    crash kernel reserved memory to previous crash_map/unmap_reserved_pages()
    implementation, the new one is more generic in name and cleaner in code
    (besides, some arch may not be allowed to unmap the pgtable).

    Therefore, this patch consolidates them, and uses the new
    arch_kexec_protect(unprotect)_crashkres() to replace former
    crash_map/unmap_reserved_pages() which by now has been only used by
    S390.

    The consolidation work needs the crash memory to be mapped initially,
    this is done in machine_kdump_pm_init() which is after
    reserve_crashkernel(). Once kdump kernel is loaded, the new
    arch_kexec_protect_crashkres() implemented for S390 will actually
    unmap the pgtable like before.

    Signed-off-by: Xunlei Pang <xlpang@redhat.com>
    Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
    Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: Minfei Huang <mhuang@redhat.com>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: Baoquan He <bhe@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Xunlei Pang
     
  • For the cases that some kernel (module) path stamps the crash reserved
    memory(already mapped by the kernel) where has been loaded the second
    kernel data, the kdump kernel will probably fail to boot when panic
    happens (or even not happens) leaving the culprit at large, this is
    unacceptable.

    The patch introduces a mechanism for detecting such cases:

    1) After each crash kexec loading, it simply marks the reserved memory
    regions readonly since we no longer access it after that. When someone
    stamps the region, the first kernel will panic and trigger the kdump.
    The weak arch_kexec_protect_crashkres() is introduced to do the actual
    protection.

    2) To allow multiple loading, once 1) was done we also need to remark
    the reserved memory to readwrite each time a system call related to
    kdump is made. The weak arch_kexec_unprotect_crashkres() is introduced
    to do the actual protection.

    The architecture can make its specific implementation by overriding
    arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres().

    Signed-off-by: Xunlei Pang
    Cc: Eric Biederman
    Cc: Dave Young
    Cc: Minfei Huang
    Cc: Vivek Goyal
    Cc: Baoquan He
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xunlei Pang
     

21 May, 2016

1 commit

  • In NMI context, printk() messages are stored into per-CPU buffers to
    avoid a possible deadlock. They are normally flushed to the main ring
    buffer via an IRQ work. But the work is never called when the system
    calls panic() in the very same NMI handler.

    This patch tries to flush NMI buffers before the crash dump is
    generated. In this case it does not risk a double release and bails out
    when the logbuf_lock is already taken. The aim is to get the messages
    into the main ring buffer when possible. It makes them better
    accessible in the vmcore.

    Then the patch tries to flush the buffers second time when other CPUs
    are down. It might be more aggressive and reset logbuf_lock. The aim
    is to get the messages available for the consequent kmsg_dump() and
    console_flush_on_panic() calls.

    The patch causes vprintk_emit() to be called even in NMI context again.
    But it is done via printk_deferred() so that the console handling is
    skipped. Consoles use internal locks and we could not prevent a
    deadlock easily. They are explicitly called later when the crash dump
    is not generated, see console_flush_on_panic().

    Signed-off-by: Petr Mladek
    Cc: Benjamin Herrenschmidt
    Cc: Daniel Thompson
    Cc: David Miller
    Cc: Ingo Molnar
    Cc: Jan Kara
    Cc: Jiri Kosina
    Cc: Martin Schwidefsky
    Cc: Peter Zijlstra
    Cc: Ralf Baechle
    Cc: Russell King
    Cc: Steven Rostedt
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Mladek
     

20 May, 2016

1 commit

  • Many developers already know that field for reference count of the
    struct page is _count and atomic type. They would try to handle it
    directly and this could break the purpose of page reference count
    tracepoint. To prevent direct _count modification, this patch rename it
    to _refcount and add warning message on the code. After that, developer
    who need to handle reference count will find that field should not be
    accessed directly.

    [akpm@linux-foundation.org: fix comments, per Vlastimil]
    [akpm@linux-foundation.org: Documentation/vm/transhuge.txt too]
    [sfr@canb.auug.org.au: sync ethernet driver changes]
    Signed-off-by: Joonsoo Kim
    Signed-off-by: Stephen Rothwell
    Cc: Vlastimil Babka
    Cc: Hugh Dickins
    Cc: Johannes Berg
    Cc: "David S. Miller"
    Cc: Sunil Goutham
    Cc: Chris Metcalf
    Cc: Manish Chopra
    Cc: Yuval Mintz
    Cc: Tariq Toukan
    Cc: Saeed Mahameed
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

29 Apr, 2016

2 commits

  • PageAnon() always look at head page to check PAGE_MAPPING_ANON and tail
    page's page->mapping has just a poisoned data since commit 1c290f642101
    ("mm: sanitize page->mapping for tail pages").

    If makedumpfile checks page->mapping of a compound tail page to
    distinguish anonymous page as usual, it must fail in newer kernel. So
    it's necessary to export OFFSET(page.compound_head) to avoid checking
    compound tail pages.

    The problem is that unnecessary hugepages won't be removed from a dump
    file in kernels 4.5.x and later. This means that extra disk space would
    be consumed. It's a problem, but not critical.

    Signed-off-by: Atsushi Kumagai
    Acked-by: Dave Young
    Cc: "Eric W. Biederman"
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Atsushi Kumagai
     
  • makedumpfile refers page.lru.next to get the order of compound pages for
    page filtering.

    However, now the order is stored in page.compound_order, hence
    VMCOREINFO should be updated to export the offset of
    page.compound_order.

    The fact is, page.compound_order was introduced already in kernel 4.0,
    but the offset of it was the same as page.lru.next until kernel 4.3, so
    this was not actual problem.

    The above can be said also for page.lru.prev and page.compound_dtor,
    it's necessary to detect hugetlbfs pages. Further, the content was
    changed from direct address to the ID which means dtor.

    The problem is that unnecessary hugepages won't be removed from a dump
    file in kernels 4.4.x and later. This means that extra disk space would
    be consumed. It's a problem, but not critical.

    Signed-off-by: Atsushi Kumagai
    Acked-by: Dave Young
    Cc: "Eric W. Biederman"
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Atsushi Kumagai