09 Aug, 2019

1 commit

  • Secure Encrypted Virtualization is an x86-specific feature, so it shouldn't
    appear in generic kernel code because it forces non-x86 architectures to
    define the sev_active() function, which doesn't make a lot of sense.

    To solve this problem, add an x86 elfcorehdr_read() function to override
    the generic weak implementation. To do that, it's necessary to make
    read_from_oldmem() public so that it can be used outside of vmcore.c.

    Also, remove the export for sev_active() since it's only used in files that
    won't be built as modules.

    Signed-off-by: Thiago Jung Bauermann
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Lianbo Jiang
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/20190806044919.10622-6-bauerman@linux.ibm.com

    Thiago Jung Bauermann
     

17 Jul, 2019

1 commit

  • Since commit 2724273e8fd0 ("vmcore: add API to collect hardware dump in
    second kernel"), drivers are allowed to add device related dump data to
    vmcore as they want by using the device dump API. This has a potential
    issue, the data is stored in memory, drivers may append too much data
    and use too much memory. The vmcore is typically used in a kdump kernel
    which runs in a pre-reserved small chunk of memory. So as a result it
    will make kdump unusable at all due to OOM issues.

    So introduce new 'novmcoredd' command line option. User can disable
    device dump to reduce memory usage. This is helpful if device dump is
    using too much memory, disabling device dump could make sure a regular
    vmcore without device dump data is still available.

    [akpm@linux-foundation.org: tweak documentation]
    [akpm@linux-foundation.org: vmcore.c needs moduleparam.h]
    Link: http://lkml.kernel.org/r/20190528111856.7276-1-kasong@redhat.com
    Signed-off-by: Kairui Song
    Acked-by: Dave Young
    Reviewed-by: Bhupesh Sharma
    Cc: Rahul Lakkireddy
    Cc: "David S . Miller"
    Cc: Eric Biederman
    Cc: Alexey Dobriyan
    Cc: Baoquan He
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kairui Song
     

20 Jun, 2019

1 commit

  • In the kdump kernel, the memory of the first kernel gets to be dumped
    into a vmcore file.

    Similarly to SME kdump, if SEV was enabled in the first kernel, the old
    memory has to be remapped encrypted in order to access it properly.

    Commit

    992b649a3f01 ("kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabled")

    took care of the SME case but it uses sme_active() which checks for SME
    only. Use mem_encrypt_active() instead, which returns true when either
    SME or SEV is active.

    Unlike SME, the second kernel images (kernel and initrd) are loaded into
    encrypted memory when SEV is active, hence the kernel elf header must be
    remapped as encrypted in order to access it properly.

    [ bp: Massage commit message. ]

    Co-developed-by: Brijesh Singh
    Signed-off-by: Brijesh Singh
    Signed-off-by: Lianbo Jiang
    Signed-off-by: Borislav Petkov
    Cc: Alexey Dobriyan
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    Cc: bhe@redhat.com
    Cc: dyoung@redhat.com
    Cc: Ganesh Goudar
    Cc: H. Peter Anvin
    Cc: kexec@lists.infradead.org
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Matthew Wilcox
    Cc: Mike Rapoport
    Cc: mingo@redhat.com
    Cc: Rahul Lakkireddy
    Cc: Souptick Joarder
    Cc: Thomas Gleixner
    Cc: Tom Lendacky
    Cc: x86-ml
    Link: https://lkml.kernel.org/r/20190430074421.7852-4-lijiang@redhat.com

    Lianbo Jiang
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

31 Oct, 2018

2 commits

  • Move remaining definitions and declarations from include/linux/bootmem.h
    into include/linux/memblock.h and remove the redundant header.

    The includes were replaced with the semantic patch below and then
    semi-automated removal of duplicated '#include

    @@
    @@
    - #include
    + #include

    [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
    [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
    [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
    Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
    Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Stephen Rothwell
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • This code can be replaced with vmf_error() inline function.

    Link: http://lkml.kernel.org/r/20180918145945.GA11392@jordon-HP-15-Notebook-PC
    Signed-off-by: Souptick Joarder
    Reviewed-by: Matthew Wilcox
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Souptick Joarder
     

09 Oct, 2018

1 commit

  • Lianbo reported a build error with a particular 32-bit config, see Link
    below for details.

    Provide a weak copy_oldmem_page_encrypted() function which architectures
    can override, in the same manner other functionality in that file is
    supplied.

    Reported-by: Lianbo Jiang
    Signed-off-by: Borislav Petkov
    CC: x86@kernel.org
    Link: http://lkml.kernel.org/r/710b9d95-2f70-eadf-c4a1-c3dc80ee4ebb@redhat.com

    Borislav Petkov
     

06 Oct, 2018

1 commit

  • In the kdump kernel, the memory of the first kernel needs to be dumped
    into the vmcore file.

    If SME is enabled in the first kernel, the old memory has to be remapped
    with the memory encryption mask in order to access it properly.

    Split copy_oldmem_page() functionality to handle encrypted memory
    properly.

    [ bp: Heavily massage everything. ]

    Signed-off-by: Lianbo Jiang
    Signed-off-by: Borislav Petkov
    Cc: kexec@lists.infradead.org
    Cc: tglx@linutronix.de
    Cc: mingo@redhat.com
    Cc: hpa@zytor.com
    Cc: akpm@linux-foundation.org
    Cc: dan.j.williams@intel.com
    Cc: bhelgaas@google.com
    Cc: baiyaowei@cmss.chinamobile.com
    Cc: tiwai@suse.de
    Cc: brijesh.singh@amd.com
    Cc: dyoung@redhat.com
    Cc: bhe@redhat.com
    Cc: jroedel@suse.de
    Link: https://lkml.kernel.org/r/be7b47f9-6be6-e0d1-2c2a-9125bc74b818@redhat.com

    Lianbo Jiang
     

24 Aug, 2018

1 commit

  • Without CONFIG_MMU, we get a build warning:

    fs/proc/vmcore.c:228:12: error: 'vmcoredd_mmap_dumps' defined but not used [-Werror=unused-function]
    static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,

    The function is only referenced from an #ifdef'ed caller, so
    this uses the same #ifdef around it.

    Link: http://lkml.kernel.org/r/20180525213526.2117790-1-arnd@arndb.de
    Fixes: 7efe48df8a3d ("vmcore: append device dumps to vmcore as elf notes")
    Signed-off-by: Arnd Bergmann
    Cc: Ganesh Goudar
    Cc: "David S. Miller"
    Cc: Rahul Lakkireddy
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     

23 Aug, 2018

1 commit

  • Use new return type vm_fault_t for fault handler in struct
    vm_operations_struct. For now, this is just documenting that the function
    returns a VM_FAULT value rather than an errno. Once all instances are
    converted, vm_fault_t will become a distinct type.

    See 1c8f422059ae ("mm: change return type to vm_fault_t") for reference.

    Link: http://lkml.kernel.org/r/20180702153325.GA3875@jordon-HP-15-Notebook-PC
    Signed-off-by: Souptick Joarder
    Reviewed-by: Matthew Wilcox
    Cc: Ganesh Goudar
    Cc: Rahul Lakkireddy
    Cc: David S. Miller
    Cc: Alexey Dobriyan
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Souptick Joarder
     

22 May, 2018

1 commit

  • Fix below build warning:

    WARNING: vmlinux.o(.text+0x422bb8): Section mismatch in reference from
    the function vmcore_add_device_dump() to the function
    .init.text:get_vmcore_size.constprop.5()

    The function vmcore_add_device_dump() references
    the function __init get_vmcore_size.constprop.5().
    This is often because vmcore_add_device_dump lacks a __init
    annotation or the annotation of get_vmcore_size.constprop.5 is wrong.

    Fixes: 7efe48df8a3d ("vmcore: append device dumps to vmcore as elf notes")
    Signed-off-by: Rahul Lakkireddy
    Signed-off-by: Ganesh Goudar
    Signed-off-by: David S. Miller

    Rahul Lakkireddy
     

15 May, 2018

2 commits

  • Update read and mmap logic to append device dumps as additional notes
    before the other elf notes. We add device dumps before other elf notes
    because the other elf notes may not fill the elf notes buffer
    completely and we will end up with zero-filled data between the elf
    notes and the device dumps. Tools will then try to decode this
    zero-filled data as valid notes and we don't want that. Hence, adding
    device dumps before the other elf notes ensure that zero-filled data
    can be avoided. This also ensures that the device dumps and the
    other elf notes can be properly mmaped at page aligned address.

    Incorporate device dump size into the total vmcore size. Also update
    offsets for other program headers after the device dumps are added.

    Suggested-by: Eric Biederman .
    Signed-off-by: Rahul Lakkireddy
    Signed-off-by: Ganesh Goudar
    Signed-off-by: David S. Miller

    Rahul Lakkireddy
     
  • The sequence of actions done by device drivers to append their device
    specific hardware/firmware logs to /proc/vmcore are as follows:

    1. During probe (before hardware is initialized), device drivers
    register to the vmcore module (via vmcore_add_device_dump()), with
    callback function, along with buffer size and log name needed for
    firmware/hardware log collection.

    2. vmcore module allocates the buffer with requested size. It adds
    an Elf note and invokes the device driver's registered callback
    function.

    3. Device driver collects all hardware/firmware logs into the buffer
    and returns control back to vmcore module.

    Ensure that the device dump buffer size is always aligned to page size
    so that it can be mmaped.

    Also, rename alloc_elfnotes_buf() to vmcore_alloc_buf() to make it more
    generic and reserve NT_VMCOREDD note type to indicate vmcore device
    dump.

    Suggested-by: Eric Biederman .
    Signed-off-by: Rahul Lakkireddy
    Signed-off-by: Ganesh Goudar
    Signed-off-by: David S. Miller

    Rahul Lakkireddy
     

07 Feb, 2018

1 commit


25 Feb, 2017

2 commits

  • When a non-cooperative userfaultfd monitor copies pages in the
    background, it may encounter regions that were already unmapped.
    Addition of UFFD_EVENT_UNMAP allows the uffd monitor to track precisely
    changes in the virtual memory layout.

    Since there might be different uffd contexts for the affected VMAs, we
    first should create a temporary representation for the unmap event for
    each uffd context and then notify them one by one to the appropriate
    userfault file descriptors.

    The event notification occurs after the mmap_sem has been released.

    [arnd@arndb.de: fix nommu build]
    Link: http://lkml.kernel.org/r/20170203165141.3665284-1-arnd@arndb.de
    [mhocko@suse.com: fix nommu build]
    Link: http://lkml.kernel.org/r/20170202091503.GA22823@dhcp22.suse.cz
    Link: http://lkml.kernel.org/r/1485542673-24387-3-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Michal Hocko
    Signed-off-by: Arnd Bergmann
    Acked-by: Hillf Danton
    Cc: Andrea Arcangeli
    Cc: "Dr. David Alan Gilbert"
    Cc: Mike Kravetz
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • ->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
    take a vma and vmf parameter when the vma already resides in vmf.

    Remove the vma parameter to simplify things.

    [arnd@arndb.de: fix ARM build]
    Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
    Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
    Signed-off-by: Dave Jiang
    Signed-off-by: Arnd Bergmann
    Reviewed-by: Ross Zwisler
    Cc: Theodore Ts'o
    Cc: Darrick J. Wong
    Cc: Matthew Wilcox
    Cc: Dave Hansen
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: Dan Williams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jiang
     

25 Dec, 2016

1 commit


13 May, 2016

1 commit

  • parse_crash_elf{32|64}_headers will check the headers via the
    elf_check_arch respectively vmcore_elf64_check_arch macro.

    The MIPS architecture implements those two macros differently.
    In order to make the differentiation more explicit, let's introduce
    an vmcore_elf32_check_arch to allow the archs to overwrite it.

    Signed-off-by: Daniel Wagner
    Suggested-by: Maciej W. Rozycki
    Reviewed-by: Maciej W. Rozycki
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-mips@linux-mips.org
    Patchwork: https://patchwork.linux-mips.org/patch/12535/
    Signed-off-by: Ralf Baechle

    Daniel Wagner
     

05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

18 Mar, 2016

1 commit

  • On i686 PAE enabled machine the contiguous physical area could be large
    and it can cause trimming down variables in below calculation in
    read_vmcore() and mmap_vmcore():

    tsz = min_t(size_t, m->offset + m->size - *fpos, buflen);

    That is, the types being used is like below on i686:
    m->offset: unsigned long long int
    m->size: unsigned long long int
    *fpos: loff_t (long long int)
    buflen: size_t (unsigned int)

    So casting (m->offset + m->size - *fpos) by size_t means truncating a
    given value by 4GB.

    Suppose (m->offset + m->size - *fpos) being truncated to 0, buflen >0
    then we will get tsz = 0. It is of course not an expected result.
    Similarly we could also get other truncated values less than buflen.
    Then the real size passed down is not correct any more.

    If (m->offset + m->size - *fpos) is above 4GB, read_vmcore or
    mmap_vmcore use the min_t result with truncated values being compared to
    buflen. Then, fpos proceeds with the wrong value so that we reach below
    bugs:

    1) read_vmcore will refuse to continue so makedumpfile fails.
    2) mmap_vmcore will trigger BUG_ON() in remap_pfn_range().

    Use unsigned long long in min_t instead so that the variables in are not
    truncated.

    Signed-off-by: Baoquan He
    Signed-off-by: Dave Young
    Cc: HATAYAMA Daisuke
    Cc: Vivek Goyal
    Cc: Jianyu Zhan
    Cc: Minfei Huang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Young
     

18 Feb, 2015

1 commit

  • When updating PT_NOTE header size (ie. p_memsz), an overflow issue
    happens with the following bogus note entry:

    n_namesz = 0xFFFFFFFF
    n_descsz = 0x0
    n_type = 0x0

    This kind of note entry should be dropped during updating p_memsz. But
    because n_namesz is 32bit, after (n_namesz + 3) & (~3), it's overflow to
    0x0, the note entry size looks sane and reserved.

    When userspace (eg. crash utility) is trying to access such bogus note,
    it could lead to an unexpected behavior (eg. crash utility segment fault
    because it's reading bogus address).

    The source of bogus note hasn't been identified yet. At least we could
    drop the bogus note so user space wouldn't be surprised.

    Signed-off-by: WANG Chao
    Cc: Dave Anderson
    Cc: Baoquan He
    Cc: Randy Wright
    Cc: Vivek Goyal
    Cc: Paul Gortmaker
    Cc: Fabian Frederick
    Cc: Vitaly Kuznetsov
    Cc: Rashika Kheria
    Cc: Greg Pearson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    WANG Chao
     

09 Aug, 2014

1 commit

  • We have a special check in read_vmcore() handler to check if the page was
    reported as ram or not by the hypervisor (pfn_is_ram()). However, when
    vmcore is read with mmap() no such check is performed. That can lead to
    unpredictable results, e.g. when running Xen PVHVM guest memcpy() after
    mmap() on /proc/vmcore will hang processing HVMMEM_mmio_dm pages creating
    enormous load in both DomU and Dom0.

    Fix the issue by mapping each non-ram page to the zero page. Keep direct
    path with remap_oldmem_pfn_range() to avoid looping through all pages on
    bare metal.

    The issue can also be solved by overriding remap_oldmem_pfn_range() in
    xen-specific code, as remap_oldmem_pfn_range() was been designed for.
    That, however, would involve non-obvious xen code path for all x86 builds
    with CONFIG_XEN_PVHVM=y and would prevent all other hypervisor-specific
    code on x86 arch from doing the same override.

    [fengguang.wu@intel.com: remap_oldmem_pfn_checked() can be static]
    [akpm@linux-foundation.org: clean up layout]
    Signed-off-by: Vitaly Kuznetsov
    Reviewed-by: Andrew Jones
    Cc: Michael Holzheu
    Acked-by: Vivek Goyal
    Cc: David Vrabel
    Signed-off-by: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Kuznetsov
     

07 Jun, 2014

1 commit


08 Apr, 2014

2 commits

  • Currently when an empty PT_NOTE is detected, vmcore initialization
    fails. It sounds too harsh. Because PT_NOTE could be empty, for
    example, one offlined a cpu but never restarted kdump service, and after
    crash, PT_NOTE program header is there but no data contains. It's
    better to warn about the empty PT_NOTE and continue to initialise
    vmcore.

    And ultimately the multiple PT_NOTE are merged into a single one, all
    empty PT_NOTE are discarded naturally during the merge. So empty
    PT_NOTE is not visible to user space and vmcore is as good as expected.

    Signed-off-by: WANG Chao
    Cc: Vivek Goyal
    Cc: HATAYAMA Daisuke
    Cc: Greg Pearson
    Cc: Baoquan He
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    WANG Chao
     
  • Eliminate the following warning in proc/vmcore.c:

    fs/proc/vmcore.c:1088:6: warning: no previous prototype for `vmcore_cleanup' [-Wmissing-prototypes]

    [akpm@linux-foundation.org: clean up powerpc, remove unneeded EXPORT_SYMBOL]
    Signed-off-by: Rashika Kheria
    Reviewed-by: Josh Triplett
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rashika Kheria
     

11 Feb, 2014

1 commit

  • Currently, update_note_header_size_elf64() and
    update_note_header_size_elf32() will add the size of a PT_NOTE entry to
    real_sz even if that causes real_sz to exceeds max_sz. This patch
    corrects the while loop logic in those routines to ensure that does not
    happen and prints a warning if a PT_NOTE entry is dropped. If zero
    PT_NOTE entries are found or this condition is encountered because the
    only entry was dropped, a warning is printed and an error is returned.

    One possible negative side effect of exceeding the max_sz limit is an
    allocation failure in merge_note_headers_elf64() or
    merge_note_headers_elf32() which would produce console output such as
    the following while booting the crash kernel.

    vmalloc: allocation failure: 14076997632 bytes
    swapper/0: page allocation failure: order:0, mode:0x80d2
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-gbp1 #7
    Call Trace:
    dump_stack+0x19/0x1b
    warn_alloc_failed+0xf0/0x160
    __vmalloc_node_range+0x19e/0x250
    vmalloc_user+0x4c/0x70
    merge_note_headers_elf64.constprop.9+0x116/0x24a
    vmcore_init+0x2d4/0x76c
    do_one_initcall+0xe2/0x190
    kernel_init_freeable+0x17c/0x207
    kernel_init+0xe/0x180
    ret_from_fork+0x7c/0xb0

    Kdump: vmcore not initialized

    kdump: dump target is /dev/sda4
    kdump: saving to /sysroot//var/crash/127.0.0.1-2014.01.28-13:58:52/
    kdump: saving vmcore-dmesg.txt
    Cannot open /proc/vmcore: No such file or directory
    kdump: saving vmcore-dmesg.txt failed
    kdump: saving vmcore
    kdump: saving vmcore failed

    This type of failure has been seen on a four socket prototype system
    with certain memory configurations. Most PT_NOTE sections have a single
    entry similar to:

    n_namesz = 0x5
    n_descsz = 0x150
    n_type = 0x1

    Occasionally, a second entry is encountered with very large n_namesz and
    n_descsz sizes:

    n_namesz = 0x80000008
    n_descsz = 0x510ae163
    n_type = 0x80000008

    Not yet sure of the source of these extra entries, they seem bogus, but
    they shouldn't cause crash dump to fail.

    Signed-off-by: Greg Pearson
    Acked-by: Vivek Goyal
    Cc: HATAYAMA Daisuke
    Cc: Michael Holzheu
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Pearson
     

24 Jan, 2014

1 commit

  • PROC_FS is a bool, so this code is either present or absent. It will
    never be modular, so using module_init as an alias for __initcall is
    rather misleading.

    Fix this up now, so that we can relocate module_init from init.h into
    module.h in the future. If we don't do this, we'd have to add module.h to
    obviously non-modular code, and that would be ugly at best.

    Note that direct use of __initcall is discouraged, vs. one of the
    priority categorized subgroups. As __initcall gets mapped onto
    device_initcall, our use of fs_initcall (which makes sense for fs code)
    will thus change these registrations from level 6-device to level 5-fs
    (i.e. slightly earlier). However no observable impact of that small
    difference has been observed during testing, or is expected.

    Also note that this change uncovers a missing semicolon bug in the
    registration of vmcore_init as an initcall.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Gortmaker
     

12 Sep, 2013

3 commits

  • The patch "s390/vmcore: Implement remap_oldmem_pfn_range for s390" allows
    now to use mmap also on s390.

    So enable mmap for s390 again.

    Signed-off-by: Michael Holzheu
    Cc: HATAYAMA Daisuke
    Cc: Jan Willeke
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Holzheu
     
  • For zfcpdump we can't map the HSA storage because it is only available via
    a read interface. Therefore, for the new vmcore mmap feature we have
    introduce a new mechanism to create mappings on demand.

    This patch introduces a new architecture function remap_oldmem_pfn_range()
    that should be used to create mappings with remap_pfn_range() for oldmem
    areas that can be directly mapped. For zfcpdump this is everything
    besides of the HSA memory. For the areas that are not mapped by
    remap_oldmem_pfn_range() a generic vmcore a new generic vmcore fault
    handler mmap_vmcore_fault() is called.

    This handler works as follows:

    * Get already available or new page from page cache (find_or_create_page)
    * Check if /proc/vmcore page is filled with data (PageUptodate)
    * If yes:
    Return that page
    * If no:
    Fill page using __vmcore_read(), set PageUptodate, and return page

    Signed-off-by: Michael Holzheu
    Acked-by: Vivek Goyal
    Cc: HATAYAMA Daisuke
    Cc: Jan Willeke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Holzheu
     
  • For s390 we want to use /proc/vmcore for our SCSI stand-alone dump
    (zfcpdump). We have support where the first HSA_SIZE bytes are saved into
    a hypervisor owned memory area (HSA) before the kdump kernel is booted.
    When the kdump kernel starts, it is restricted to use only HSA_SIZE bytes.

    The advantages of this mechanism are:

    * No crashkernel memory has to be defined in the old kernel.
    * Early boot problems (before kexec_load has been done) can be dumped
    * Non-Linux systems can be dumped.

    We modify the s390 copy_oldmem_page() function to read from the HSA memory
    if memory below HSA_SIZE bytes is requested.

    Since we cannot use the kexec tool to load the kernel in this scenario,
    we have to build the ELF header in the 2nd (kdump/new) kernel.

    So with the following patch set we would like to introduce the new
    function that the ELF header for /proc/vmcore can be created in the 2nd
    kernel memory.

    The following steps are done during zfcpdump execution:

    1. Production system crashes
    2. User boots a SCSI disk that has been prepared with the zfcpdump tool
    3. Hypervisor saves CPU state of boot CPU and HSA_SIZE bytes of memory into HSA
    4. Boot loader loads kernel into low memory area
    5. Kernel boots and uses only HSA_SIZE bytes of memory
    6. Kernel saves registers of non-boot CPUs
    7. Kernel does memory detection for dump memory map
    8. Kernel creates ELF header for /proc/vmcore
    9. /proc/vmcore uses this header for initialization
    10. The zfcpdump user space reads /proc/vmcore to write dump to SCSI disk
    - copy_oldmem_page() copies from HSA for memory below HSA_SIZE
    - copy_oldmem_page() copies from real memory for memory above HSA_SIZE

    Currently for s390 we create the ELF core header in the 2nd kernel with a
    small trick. We relocate the addresses in the ELF header in a way that
    for the /proc/vmcore code it seems to be in the 1st kernel (old) memory
    and the read_from_oldmem() returns the correct data. This allows the
    /proc/vmcore code to use the ELF header in the 2nd kernel.

    This patch:

    Exchange the old mechanism with the new and much cleaner function call
    override feature that now offcially allows to create the ELF core header
    in the 2nd kernel.

    To use the new feature the following function have to be defined
    by the architecture backend code to read from new memory:

    * elfcorehdr_alloc: Allocate ELF header
    * elfcorehdr_free: Free the memory of the ELF header
    * elfcorehdr_read: Read from ELF header
    * elfcorehdr_read_notes: Read from ELF notes

    Signed-off-by: Michael Holzheu
    Acked-by: Vivek Goyal
    Cc: HATAYAMA Daisuke
    Cc: Jan Willeke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Holzheu
     

18 Jul, 2013

1 commit

  • The kdump mmap patch series (git commit 83086978c63afd7c73e1c) directly
    map the PT_LOADs to memory. On s390 this does not work because the
    copy_from_oldmem() function swaps [0,crashkernel size] with
    [crashkernel base, crashkernel base+crashkernel size]. The swap
    int copy_from_oldmem() was done in order correctly implement /dev/oldmem.

    See: http://marc.info/?l=kexec&m=136940802511603&w=2

    Signed-off-by: Michael Holzheu
    Signed-off-by: Martin Schwidefsky

    Michael Holzheu
     

04 Jul, 2013

7 commits

  • This patch introduces mmap_vmcore().

    Don't permit writable nor executable mapping even with mprotect()
    because this mmap() is aimed at reading crash dump memory. Non-writable
    mapping is also requirement of remap_pfn_range() when mapping linear
    pages on non-consecutive physical pages; see is_cow_mapping().

    Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
    remap_vmalloc_range_pertial at the same time for a single vma.
    do_munmap() can correctly clean partially remapped vma with two
    functions in abnormal case. See zap_pte_range(), vm_normal_page() and
    their comments for details.

    On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
    limitation comes from the fact that the third argument of
    remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.

    [akpm@linux-foundation.org: use min(), switch to conventional error-unwinding approach]
    Signed-off-by: HATAYAMA Daisuke
    Acked-by: Vivek Goyal
    Cc: KOSAKI Motohiro
    Cc: Atsushi Kumagai
    Cc: Lisa Mitchell
    Cc: Zhang Yanfei
    Tested-by: Maxim Uvarov
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     
  • The previous patches newly added holes before each chunk of memory and
    the holes need to be count in vmcore file size. There are two ways to
    count file size in such a way:

    1) suppose m is a poitner to the last vmcore object in vmcore_list.
    Then file size is (m->offset + m->size), or

    2) calculate sum of size of buffers for ELF header, program headers,
    ELF note segments and objects in vmcore_list.

    Although 1) is more direct and simpler than 2), 2) seems better in that
    it reflects internal object structure of /proc/vmcore. Thus, this patch
    changes get_vmcore_size_elf{64, 32} so that it calculates size in the
    way of 2).

    As a result, both get_vmcore_size_elf{64, 32} have the same definition.
    Merge them as get_vmcore_size.

    Signed-off-by: HATAYAMA Daisuke
    Acked-by: Vivek Goyal
    Cc: KOSAKI Motohiro
    Cc: Atsushi Kumagai
    Cc: Lisa Mitchell
    Cc: Zhang Yanfei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     
  • Now ELF note segment has been copied in the buffer on vmalloc memory.
    To allow user process to remap the ELF note segment buffer with
    remap_vmalloc_page, the corresponding VM area object has to have
    VM_USERMAP flag set.

    [akpm@linux-foundation.org: use the conventional comment layout]
    Signed-off-by: HATAYAMA Daisuke
    Acked-by: Vivek Goyal
    Cc: KOSAKI Motohiro
    Cc: Atsushi Kumagai
    Cc: Lisa Mitchell
    Cc: Zhang Yanfei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     
  • The reasons why we don't allocate ELF note segment in the 1st kernel
    (old memory) on page boundary is to keep backward compatibility for old
    kernels, and that if doing so, we waste not a little memory due to
    round-up operation to fit the memory to page boundary since most of the
    buffers are in per-cpu area.

    ELF notes are per-cpu, so total size of ELF note segments depends on
    number of CPUs. The current maximum number of CPUs on x86_64 is 5192,
    and there's already system with 4192 CPUs in SGI, where total size
    amounts to 1MB. This can be larger in the near future or possibly even
    now on another architecture that has larger size of note per a single
    cpu. Thus, to avoid the case where memory allocation for large block
    fails, we allocate vmcore objects on vmalloc memory.

    This patch adds elfnotes_buf and elfnotes_sz variables to keep pointer
    to the ELF note segment buffer and its size. There's no longer the
    vmcore object that corresponds to the ELF note segment in vmcore_list.
    Accordingly, read_vmcore() has new case for ELF note segment and
    set_vmcore_list_offsets_elf{64,32}() and other helper functions starts
    calculating offset from sum of size of ELF headers and size of ELF note
    segment.

    [akpm@linux-foundation.org: use min(), fix error-path vzalloc() leaks]
    Signed-off-by: HATAYAMA Daisuke
    Acked-by: Vivek Goyal
    Cc: KOSAKI Motohiro
    Cc: Atsushi Kumagai
    Cc: Lisa Mitchell
    Cc: Zhang Yanfei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     
  • …-size boundary in vmcore_list

    Treat memory chunks referenced by PT_LOAD program header entries in
    page-size boundary in vmcore_list. Formally, for each range [start,
    end], we set up the corresponding vmcore object in vmcore_list to
    [rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].

    This change affects layout of /proc/vmcore. The gaps generated by the
    rearrangement are newly made visible to applications as holes.
    Concretely, they are two ranges [rounddown(start, PAGE_SIZE), start] and
    [end, roundup(end, PAGE_SIZE)].

    Suppose variable m points at a vmcore object in vmcore_list, and
    variable phdr points at the program header of PT_LOAD type the variable
    m corresponds to. Then, pictorially:

    m->offset +---------------+
    | hole |
    phdr->p_offset = +---------------+
    m->offset + (paddr - start) | |\
    | kernel memory | phdr->p_memsz
    | |/
    +---------------+
    | hole |
    m->offset + m->size +---------------+

    where m->offset and m->offset + m->size are always page-size aligned.

    Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
    Acked-by: Vivek Goyal <vgoyal@redhat.com>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
    Cc: Lisa Mitchell <lisa.mitchell@hp.com>
    Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    HATAYAMA Daisuke
     
  • Allocate ELF headers on page-size boundary using __get_free_pages()
    instead of kmalloc().

    Later patch will merge PT_NOTE entries into a single unique one and
    decrease the buffer size actually used. Keep original buffer size in
    variable elfcorebuf_sz_orig to kfree the buffer later and actually used
    buffer size with rounded up to page-size boundary in variable
    elfcorebuf_sz separately.

    The size of part of the ELF buffer exported from /proc/vmcore is
    elfcorebuf_sz.

    The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz,
    elfcorebuf_sz_orig], is filled with 0.

    Use size of the ELF headers as an initial offset value in
    set_vmcore_list_offsets_elf{64,32} and
    process_ptload_program_headers_elf{64,32} in order to indicate that the
    offset includes the holes towards the page boundary.

    As a result, both set_vmcore_list_offsets_elf{64,32} have the same
    definition. Merge them as set_vmcore_list_offsets.

    [akpm@linux-foundation.org: add free_elfcorebuf(), cleanups]
    Signed-off-by: HATAYAMA Daisuke
    Acked-by: Vivek Goyal
    Cc: KOSAKI Motohiro
    Cc: Atsushi Kumagai
    Cc: Lisa Mitchell
    Cc: Zhang Yanfei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     
  • Rewrite part of read_vmcore() that reads objects in vmcore_list in the
    same way as part reading ELF headers, by which some duplicated and
    redundant codes are removed.

    Signed-off-by: HATAYAMA Daisuke
    Acked-by: Vivek Goyal
    Cc: KOSAKI Motohiro
    Cc: Atsushi Kumagai
    Cc: Lisa Mitchell
    Cc: Zhang Yanfei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     

02 May, 2013

1 commit

  • Supply a function (proc_remove()) to remove a proc entry (and any subtree
    rooted there) by proc_dir_entry pointer rather than by name and (optionally)
    root dir entry pointer. This allows us to eliminate all remaining pde->name
    accesses outside of procfs.

    Signed-off-by: David Howells
    Acked-by: Grant Likely
    cc: linux-acpi@vger.kernel.org
    cc: openipmi-developer@lists.sourceforge.net
    cc: devicetree-discuss@lists.ozlabs.org
    cc: linux-pci@vger.kernel.org
    cc: netdev@vger.kernel.org
    cc: netfilter-devel@vger.kernel.org
    cc: alsa-devel@alsa-project.org
    Signed-off-by: Al Viro

    David Howells
     

30 Apr, 2013

1 commit