03 Aug, 2016

1 commit

  • kexec physical addresses are the boot-time view of the system. For
    certain ARM systems (such as Keystone 2), the boot view of the system
    does not match the kernel's view of the system: the boot view uses a
    special alias in the lower 4GB of the physical address space.

    To cater for these kinds of setups, we need to translate between the
    boot view physical addresses and the normal kernel view physical
    addresses. This patch extracts the current transation points into
    linux/kexec.h, and allows an architecture to override the functions.

    Due to the translations required, we unfortunately end up with six
    translation functions, which are reduced down to four that the
    architecture can override.

    [akpm@linux-foundation.org: kexec.h needs asm/io.h for phys_to_virt()]
    Link: http://lkml.kernel.org/r/E1b8koP-0004HZ-Vf@rmk-PC.armlinux.org.uk
    Signed-off-by: Russell King
    Cc: Keerthy
    Cc: Pratyush Anand
    Cc: Vitaly Andrianov
    Cc: Eric Biederman
    Cc: Dave Young
    Cc: Baoquan He
    Cc: Vivek Goyal
    Cc: Simon Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Russell King
     

24 May, 2016

4 commits

  • …unprotect)_crashkres()

    Commit 3f625002581b ("kexec: introduce a protection mechanism for the
    crashkernel reserved memory") is a similar mechanism for protecting the
    crash kernel reserved memory to previous crash_map/unmap_reserved_pages()
    implementation, the new one is more generic in name and cleaner in code
    (besides, some arch may not be allowed to unmap the pgtable).

    Therefore, this patch consolidates them, and uses the new
    arch_kexec_protect(unprotect)_crashkres() to replace former
    crash_map/unmap_reserved_pages() which by now has been only used by
    S390.

    The consolidation work needs the crash memory to be mapped initially,
    this is done in machine_kdump_pm_init() which is after
    reserve_crashkernel(). Once kdump kernel is loaded, the new
    arch_kexec_protect_crashkres() implemented for S390 will actually
    unmap the pgtable like before.

    Signed-off-by: Xunlei Pang <xlpang@redhat.com>
    Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
    Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: Minfei Huang <mhuang@redhat.com>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: Baoquan He <bhe@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Xunlei Pang
     
  • There are a lof of work to be done in function kexec_load, not only for
    allocating structs and loading initram, but also for some misc.

    To make it more clear, wrap a new function do_kexec_load which is used
    to allocate structs and load initram. And the pre-work will be done in
    kexec_load.

    Signed-off-by: Minfei Huang
    Cc: Vivek Goyal
    Cc: "Eric W. Biederman"
    Cc: Xunlei Pang
    Cc: Baoquan He
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minfei Huang
     
  • For some arch, kexec shall map the reserved pages, then use them, when
    we try to start the kdump service.

    kexec may return directly, without unmaping the reserved pages, if it
    fails during starting service. To fix it, we make a pair of map/unmap
    reserved pages both in generic path and error path.

    This patch only affects s390. Other architecturess don't implement the
    interface of crash_unmap_reserved_pages and crash_map_reserved_pages.

    It isn't a urgent patch. Kernel can work well without any risk,
    although the reserved pages are not unmapped before returning in error
    path.

    Signed-off-by: Minfei Huang
    Cc: Vivek Goyal
    Cc: "Eric W. Biederman"
    Cc: Xunlei Pang
    Cc: Baoquan He
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minfei Huang
     
  • For the cases that some kernel (module) path stamps the crash reserved
    memory(already mapped by the kernel) where has been loaded the second
    kernel data, the kdump kernel will probably fail to boot when panic
    happens (or even not happens) leaving the culprit at large, this is
    unacceptable.

    The patch introduces a mechanism for detecting such cases:

    1) After each crash kexec loading, it simply marks the reserved memory
    regions readonly since we no longer access it after that. When someone
    stamps the region, the first kernel will panic and trigger the kdump.
    The weak arch_kexec_protect_crashkres() is introduced to do the actual
    protection.

    2) To allow multiple loading, once 1) was done we also need to remark
    the reserved memory to readwrite each time a system call related to
    kdump is made. The weak arch_kexec_unprotect_crashkres() is introduced
    to do the actual protection.

    The architecture can make its specific implementation by overriding
    arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres().

    Signed-off-by: Xunlei Pang
    Cc: Eric Biederman
    Cc: Dave Young
    Cc: Minfei Huang
    Cc: Vivek Goyal
    Cc: Baoquan He
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xunlei Pang
     

21 Jan, 2016

1 commit

  • sanity_check_segment_list() checks KEXEC_TYPE_CRASH flag to ensure all the
    segments of the loaded crash kernel are within the kernel crash resource
    limits, so set the flag beforehand.

    Signed-off-by: Xunlei Pang
    Acked-by: Dave Young
    Cc: Eric Biederman
    Cc: Vivek Goyal
    Acked-by: Baoquan He
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xunlei Pang
     

07 Nov, 2015

1 commit

  • kexec output message misses the prefix "kexec", when Dave Young split the
    kexec code. Now, we use file name as the output message prefix.

    Currently, the format of output message:
    [ 140.290795] SYSC_kexec_load: hello, world
    [ 140.291534] kexec: sanity_check_segment_list: hello, world

    Ideally, the format of output message:
    [ 30.791503] kexec: SYSC_kexec_load, Hello, world
    [ 79.182752] kexec_core: sanity_check_segment_list, Hello, world

    Remove the custom prefix "kexec" in output message.

    Signed-off-by: Minfei Huang
    Acked-by: Dave Young
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minfei Huang
     

11 Sep, 2015

2 commits

  • There are two kexec load syscalls, kexec_load another and kexec_file_load.
    kexec_file_load has been splited as kernel/kexec_file.c. In this patch I
    split kexec_load syscall code to kernel/kexec.c.

    And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
    use kexec_file_load only, or vice verse.

    The original requirement is from Ted Ts'o, he want kexec kernel signature
    being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use
    kexec_load syscall can bypass the checking.

    Vivek Goyal proposed to create a common kconfig option so user can compile
    in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects
    KEXEC_CORE so that old config files still work.

    Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
    architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
    KEXEC_CORE in arch Kconfig. Also updated general kernel code with to
    kexec_load syscall.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Dave Young
    Cc: Eric W. Biederman
    Cc: Vivek Goyal
    Cc: Petr Tesarik
    Cc: Theodore Ts'o
    Cc: Josh Boyer
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Young
     
  • Split kexec_file syscall related code to another file kernel/kexec_file.c
    so that the #ifdef CONFIG_KEXEC_FILE in kexec.c can be dropped.

    Sharing variables and functions are moved to kernel/kexec_internal.h per
    suggestion from Vivek and Petr.

    [akpm@linux-foundation.org: fix bisectability]
    [akpm@linux-foundation.org: declare the various arch_kexec functions]
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Dave Young
    Cc: Eric W. Biederman
    Cc: Vivek Goyal
    Cc: Petr Tesarik
    Cc: Theodore Ts'o
    Cc: Josh Boyer
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Young
     

01 Jul, 2015

1 commit

  • Commit f06e5153f4ae2e ("kernel/panic.c: add "crash_kexec_post_notifiers"
    option for kdump after panic_notifers") introduced
    "crash_kexec_post_notifiers" kernel boot option, which toggles wheather
    panic() calls crash_kexec() before panic_notifiers and dump kmsg or after.

    The problem is that the commit overlooks panic_on_oops kernel boot option.
    If it is enabled, crash_kexec() is called directly without going through
    panic() in oops path.

    To fix this issue, this patch adds a check to "crash_kexec_post_notifiers"
    in the condition of kexec_should_crash().

    Also, put a comment in kexec_should_crash() to explain not obvious things
    on this patch.

    Signed-off-by: HATAYAMA Daisuke
    Acked-by: Baoquan He
    Tested-by: Hidehiro Kawai
    Reviewed-by: Masami Hiramatsu
    Cc: Vivek Goyal
    Cc: Ingo Molnar
    Cc: Hidehiro Kawai
    Cc: Baoquan He
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     

23 Apr, 2015

1 commit

  • Introduce KEXEC_CONTROL_MEMORY_GFP to allow the architecture code
    to override the gfp flags of the allocation for the kexec control
    page. The loop in kimage_alloc_normal_control_pages allocates pages
    with GFP_KERNEL until a page is found that happens to have an
    address smaller than the KEXEC_CONTROL_MEMORY_LIMIT. On systems
    with a large memory size but a small KEXEC_CONTROL_MEMORY_LIMIT
    the loop will keep allocating memory until the oom killer steps in.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

18 Feb, 2015

3 commits

  • Simplify the code around one of the conditionals in the kexec_load syscall
    routine.

    The original code was confusing with a redundant check on KEXEC_ON_CRASH
    and comments outside of the conditional block. This change switches the
    order of the conditional check, and cleans up the comments for the
    conditional. There is no functional change to the code.

    Signed-off-by: Geoff Levand
    Acked-by: Vivek Goyal
    Cc: Arnd Bergmann
    Cc: Benjamin Herrenschmidt
    Cc: H. Peter Anvin
    Cc: Maximilian Attems
    Cc: Michal Marek
    Cc: Paul Bolle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geoff Levand
     
  • Signed-off-by: Alexander Kuleshov
    Acked-by: "Eric W. Biederman"
    Acked-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Kuleshov
     
  • struct kimage has a member destination which is used to store the real
    destination address of each page when load segment from user space buffer
    to kernel. But we never retrieve the value stored in kimage->destination,
    so this member variable in kimage and its assignment operation are
    redundent code.

    I guess for_each_kimage_entry just does the work that kimage->destination
    is expected to do.

    So in this patch just make a cleanup to remove it.

    Signed-off-by: Baoquan He
    Cc: "Eric W. Biederman"
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baoquan He
     

11 Feb, 2015

1 commit

  • Pull trivial tree changes from Jiri Kosina:
    "Patches from trivial.git that keep the world turning around.

    Mostly documentation and comment fixes, and a two corner-case code
    fixes from Alan Cox"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    kexec, Kconfig: spell "architecture" properly
    mm: fix cleancache debugfs directory path
    blackfin: mach-common: ints-priority: remove unused function
    doubletalk: probe failure causes OOPS
    ARM: cache-l2x0.c: Make it clear that cache-l2x0 handles L310 cache controller
    msdos_fs.h: fix 'fields' in comment
    scsi: aic7xxx: fix comment
    ARM: l2c: fix comment
    ibmraid: fix writeable attribute with no store method
    dynamic_debug: fix comment
    doc: usbmon: fix spelling s/unpriviledged/unprivileged/
    x86: init_mem_mapping(): use capital BIOS in comment

    Linus Torvalds
     

26 Jan, 2015

1 commit


14 Dec, 2014

1 commit


14 Oct, 2014

2 commits

  • This is a cleanup. In function parse_crashkernel_suffix, the parameter
    crash_base is not used. So here remove it.

    Signed-off-by: Baoquan He
    Acked-by: Vivek Goyal
    Cc: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baoquan He
     
  • In locate_mem_hole functions, a memory hole is located and added as
    kexec_segment. But from the name of locate_mem_hole, it should only take
    responsibility of searching a available memory hole to contain data of a
    specified size.

    So in this patch add a new field 'mem' into kexec_buf, then take that
    kexec segment adding code out of locate_mem_hole_top_down and
    locate_mem_hole_bottom_up. This make clear of the functionality of
    locate_mem_hole just like it declars to do. And by this
    locate_mem_hole_callback chould be used later if anyone want to locate a
    memory hole for other use.

    Meanwhile Vivek suggested opening code function __kexec_add_segment(),
    that way we have to retreive ksegment pointer once and it is easy to read.
    So just do it in this patch and remove __kexec_add_segment() since no one
    use it anymore.

    Signed-off-by: Baoquan He
    Acked-by: Vivek Goyal
    Cc: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baoquan He
     

30 Aug, 2014

1 commit

  • Currently new system call kexec_file_load() and all the associated code
    compiles if CONFIG_KEXEC=y. But new syscall also compiles purgatory
    code which currently uses gcc option -mcmodel=large. This option seems
    to be available only gcc 4.4 onwards.

    Hiding new functionality behind a new config option will not break
    existing users of old gcc. Those who wish to enable new functionality
    will require new gcc. Having said that, I am trying to figure out how
    can I move away from using -mcmodel=large but that can take a while.

    I think there are other advantages of introducing this new config
    option. As this option will be enabled only on x86_64, other arches
    don't have to compile generic kexec code which will never be used. This
    new code selects CRYPTO=y and CRYPTO_SHA256=y. And all other arches had
    to do this for CONFIG_KEXEC. Now with introduction of new config
    option, we can remove crypto dependency from other arches.

    Now CONFIG_KEXEC_FILE is available only on x86_64. So whereever I had
    CONFIG_X86_64 defined, I got rid of that.

    For CONFIG_KEXEC_FILE, instead of doing select CRYPTO=y, I changed it to
    "depends on CRYPTO=y". This should be safer as "select" is not
    recursive.

    Signed-off-by: Vivek Goyal
    Cc: Eric Biederman
    Cc: H. Peter Anvin
    Tested-by: Shaun Ruffell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     

09 Aug, 2014

9 commits

  • This is the final piece of the puzzle of verifying kernel image signature
    during kexec_file_load() syscall.

    This patch calls into PE file routines to verify signature of bzImage. If
    signature are valid, kexec_file_load() succeeds otherwise it fails.

    Two new config options have been introduced. First one is
    CONFIG_KEXEC_VERIFY_SIG. This option enforces that kernel has to be
    validly signed otherwise kernel load will fail. If this option is not
    set, no signature verification will be done. Only exception will be when
    secureboot is enabled. In that case signature verification should be
    automatically enforced when secureboot is enabled. But that will happen
    when secureboot patches are merged.

    Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG. This option
    enables signature verification support on bzImage. If this option is not
    set and previous one is set, kernel image loading will fail because kernel
    does not have support to verify signature of bzImage.

    I tested these patches with both "pesign" and "sbsign" signed bzImages.

    I used signing_key.priv key and signing_key.x509 cert for signing as
    generated during kernel build process (if module signing is enabled).

    Used following method to sign bzImage.

    pesign
    ======
    - Convert DER format cert to PEM format cert
    openssl x509 -in signing_key.x509 -inform DER -out signing_key.x509.PEM -outform
    PEM

    - Generate a .p12 file from existing cert and private key file
    openssl pkcs12 -export -out kernel-key.p12 -inkey signing_key.priv -in
    signing_key.x509.PEM

    - Import .p12 file into pesign db
    pk12util -i /tmp/kernel-key.p12 -d /etc/pki/pesign

    - Sign bzImage
    pesign -i /boot/vmlinuz-3.16.0-rc3+ -o /boot/vmlinuz-3.16.0-rc3+.signed.pesign
    -c "Glacier signing key - Magrathea" -s

    sbsign
    ======
    sbsign --key signing_key.priv --cert signing_key.x509.PEM --output
    /boot/vmlinuz-3.16.0-rc3+.signed.sbsign /boot/vmlinuz-3.16.0-rc3+

    Patch details:

    Well all the hard work is done in previous patches. Now bzImage loader
    has just call into that code and verify whether bzImage signature are
    valid or not.

    Also create two config options. First one is CONFIG_KEXEC_VERIFY_SIG.
    This option enforces that kernel has to be validly signed otherwise kernel
    load will fail. If this option is not set, no signature verification will
    be done. Only exception will be when secureboot is enabled. In that case
    signature verification should be automatically enforced when secureboot is
    enabled. But that will happen when secureboot patches are merged.

    Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG. This option
    enables signature verification support on bzImage. If this option is not
    set and previous one is set, kernel image loading will fail because kernel
    does not have support to verify signature of bzImage.

    Signed-off-by: Vivek Goyal
    Cc: Borislav Petkov
    Cc: Michael Kerrisk
    Cc: Yinghai Lu
    Cc: Eric Biederman
    Cc: H. Peter Anvin
    Cc: Matthew Garrett
    Cc: Greg Kroah-Hartman
    Cc: Dave Young
    Cc: WANG Chao
    Cc: Baoquan He
    Cc: Andy Lutomirski
    Cc: Matt Fleming
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     
  • This patch adds support for loading a kexec on panic (kdump) kernel usning
    new system call.

    It prepares ELF headers for memory areas to be dumped and for saved cpu
    registers. Also prepares the memory map for second kernel and limits its
    boot to reserved areas only.

    Signed-off-by: Vivek Goyal
    Cc: Borislav Petkov
    Cc: Michael Kerrisk
    Cc: Yinghai Lu
    Cc: Eric Biederman
    Cc: H. Peter Anvin
    Cc: Matthew Garrett
    Cc: Greg Kroah-Hartman
    Cc: Dave Young
    Cc: WANG Chao
    Cc: Baoquan He
    Cc: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     
  • This is loader specific code which can load bzImage and set it up for
    64bit entry. This does not take care of 32bit entry or real mode entry.

    32bit mode entry can be implemented if somebody needs it.

    Signed-off-by: Vivek Goyal
    Cc: Borislav Petkov
    Cc: Michael Kerrisk
    Cc: Yinghai Lu
    Cc: Eric Biederman
    Cc: H. Peter Anvin
    Cc: Matthew Garrett
    Cc: Greg Kroah-Hartman
    Cc: Dave Young
    Cc: WANG Chao
    Cc: Baoquan He
    Cc: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     
  • Load purgatory code in RAM and relocate it based on the location.
    Relocation code has been inspired by module relocation code and purgatory
    relocation code in kexec-tools.

    Also compute the checksums of loaded kexec segments and store them in
    purgatory.

    Arch independent code provides this functionality so that arch dependent
    bootloaders can make use of it.

    Helper functions are provided to get/set symbol values in purgatory which
    are used by bootloaders later to set things like stack and entry point of
    second kernel etc.

    Signed-off-by: Vivek Goyal
    Cc: Borislav Petkov
    Cc: Michael Kerrisk
    Cc: Yinghai Lu
    Cc: Eric Biederman
    Cc: H. Peter Anvin
    Cc: Matthew Garrett
    Cc: Greg Kroah-Hartman
    Cc: Dave Young
    Cc: WANG Chao
    Cc: Baoquan He
    Cc: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     
  • Previous patch provided the interface definition and this patch prvides
    implementation of new syscall.

    Previously segment list was prepared in user space. Now user space just
    passes kernel fd, initrd fd and command line and kernel will create a
    segment list internally.

    This patch contains generic part of the code. Actual segment preparation
    and loading is done by arch and image specific loader. Which comes in
    next patch.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Vivek Goyal
    Cc: Borislav Petkov
    Cc: Michael Kerrisk
    Cc: Yinghai Lu
    Cc: Eric Biederman
    Cc: H. Peter Anvin
    Cc: Matthew Garrett
    Cc: Greg Kroah-Hartman
    Cc: Dave Young
    Cc: WANG Chao
    Cc: Baoquan He
    Cc: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     
  • This is the new syscall kexec_file_load() declaration/interface. I have
    reserved the syscall number only for x86_64 so far. Other architectures
    (including i386) can reserve syscall number when they enable the support
    for this new syscall.

    Signed-off-by: Vivek Goyal
    Cc: Michael Kerrisk
    Cc: Borislav Petkov
    Cc: Yinghai Lu
    Cc: Eric Biederman
    Cc: H. Peter Anvin
    Cc: Matthew Garrett
    Cc: Greg Kroah-Hartman
    Cc: Dave Young
    Cc: WANG Chao
    Cc: Baoquan He
    Cc: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     
  • kimage_normal_alloc() and kimage_crash_alloc() are doing lot of similar
    things and differ only little. So instead of having two separate
    functions create a common function kimage_alloc_init() and pass it the
    "flags" argument which tells whether it is normal kexec or kexec_on_panic.
    And this function should be able to deal with both the cases.

    This consolidation also helps later where we can use a common function
    kimage_file_alloc_init() to handle normal and crash cases for new file
    based kexec syscall.

    Signed-off-by: Vivek Goyal
    Cc: Borislav Petkov
    Cc: Michael Kerrisk
    Cc: Yinghai Lu
    Cc: Eric Biederman
    Cc: H. Peter Anvin
    Cc: Matthew Garrett
    Cc: Greg Kroah-Hartman
    Cc: Dave Young
    Cc: WANG Chao
    Cc: Baoquan He
    Cc: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     
  • Previously do_kimage_alloc() will allocate a kimage structure, copy
    segment list from user space and then do the segment list sanity
    verification.

    Break down this function in 3 parts. do_kimage_alloc_init() to do actual
    allocation and basic initialization of kimage structure.
    copy_user_segment_list() to copy segment list from user space and
    sanity_check_segment_list() to verify the sanity of segment list as passed
    by user space.

    In later patches, I need to only allocate kimage and not copy segment list
    from user space. So breaking down in smaller functions enables re-use of
    code at other places.

    Signed-off-by: Vivek Goyal
    Cc: Borislav Petkov
    Cc: Michael Kerrisk
    Cc: Yinghai Lu
    Cc: Eric Biederman
    Cc: H. Peter Anvin
    Cc: Matthew Garrett
    Cc: Greg Kroah-Hartman
    Cc: Dave Young
    Cc: WANG Chao
    Cc: Baoquan He
    Cc: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     
  • Let's use the more common "unusable".

    This patch was originally written and posted by Boris. I am including it
    in this patch series.

    Signed-off-by: Borislav Petkov
    Signed-off-by: Vivek Goyal
    Cc: Borislav Petkov
    Cc: Michael Kerrisk
    Cc: Yinghai Lu
    Cc: Eric Biederman
    Cc: H. Peter Anvin
    Cc: Matthew Garrett
    Cc: Greg Kroah-Hartman
    Cc: Dave Young
    Cc: WANG Chao
    Cc: Baoquan He
    Cc: Andy Lutomirski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     

31 Jul, 2014

2 commits

  • free_huge_page() is undefined without CONFIG_HUGETLBFS and there's no
    need to filter PageHuge() page is such a configuration either, so avoid
    exporting the symbol to fix a build error:

    In file included from kernel/kexec.c:14:0:
    kernel/kexec.c: In function 'crash_save_vmcoreinfo_init':
    kernel/kexec.c:1623:20: error: 'free_huge_page' undeclared (first use in this function)
    VMCOREINFO_SYMBOL(free_huge_page);
    ^

    Introduced by commit 8f1d26d0e59b ("kexec: export free_huge_page to
    VMCOREINFO")

    Reported-by: kbuild test robot
    Acked-by: Olof Johansson
    Cc: Atsushi Kumagai
    Cc: Baoquan He
    Cc: Vivek Goyal
    Cc: Andrew Morton
    Signed-off-by: David Rientjes
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • PG_head_mask was added into VMCOREINFO to filter huge pages in b3acc56bfe1
    ("kexec: save PG_head_mask in VMCOREINFO"), but makedumpfile still need
    another symbol to filter *hugetlbfs* pages.

    If a user hope to filter user pages, makedumpfile tries to exclude them by
    checking the condition whether the page is anonymous, but hugetlbfs pages
    aren't anonymous while they also be user pages.

    We know it's possible to detect them in the same way as PageHuge(),
    so we need the start address of free_huge_page():

    int PageHuge(struct page *page)
    {
    if (!PageCompound(page))
    return 0;

    page = compound_head(page);
    return get_compound_page_dtor(page) == free_huge_page;
    }

    For that reason, this patch changes free_huge_page() into public
    to export it to VMCOREINFO.

    Signed-off-by: Atsushi Kumagai
    Acked-by: Baoquan He
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Atsushi Kumagai
     

24 Jun, 2014

1 commit

  • To allow filtering of huge pages, makedumpfile must be able to identify
    them in the dump. This can be done by checking the appropriate page
    flag, so communicate its value to makedumpfile through the VMCOREINFO
    interface.

    There's only one small catch. Depending on how many page flags are
    available on a given architecture, this bit can be called PG_head or
    PG_compound.

    I sent a similar patch back in 2012, but Eric Biederman did not like
    using an #ifdef. So, this time I'm adding a common symbol
    (PG_head_mask) instead.

    See https://lkml.org/lkml/2012/11/28/91 for the previous version.

    Signed-off-by: Petr Tesarik
    Acked-by: Vivek Goyal
    Cc: Eric Biederman
    Cc: Paul Mackerras
    Cc: Fengguang Wu
    Cc: Benjamin Herrenschmidt
    Cc: Shaohua Li
    Cc: Alexey Kardashevskiy
    Cc: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Tesarik
     

07 Jun, 2014

1 commit


28 May, 2014

1 commit

  • If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
    (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
    get the following messages during boot:

    [ 0.089866] POWER8 performance monitor hardware support registered
    [ 0.089985] power8-pmu: PMAO restore workaround active.
    [ 5.095419] Processor 1 is stuck.
    [ 10.097933] Processor 2 is stuck.
    [ 15.100480] Processor 3 is stuck.
    [ 20.102982] Processor 4 is stuck.
    [ 25.105489] Processor 5 is stuck.
    [ 30.108005] Processor 6 is stuck.
    [ 35.110518] Processor 7 is stuck.
    [ 40.113369] Processor 9 is stuck.
    [ 45.115879] Processor 10 is stuck.
    [ 50.118389] Processor 11 is stuck.
    [ 55.120904] Processor 12 is stuck.
    [ 60.123425] Processor 13 is stuck.
    [ 65.125970] Processor 14 is stuck.
    [ 70.128495] Processor 15 is stuck.
    [ 75.131316] Processor 17 is stuck.

    Note that only the sibling threads are stuck, while the primary threads (0, 8,
    16 etc) boot just fine. Looking closer at the previous step of kexec, we observe
    that kexec tries to wakeup (bring online) the sibling threads of all the cores,
    before performing kexec:

    [ 9464.131231] Starting new kernel
    [ 9464.148507] kexec: Waking offline cpu 1.
    [ 9464.148552] kexec: Waking offline cpu 2.
    [ 9464.148600] kexec: Waking offline cpu 3.
    [ 9464.148636] kexec: Waking offline cpu 4.
    [ 9464.148671] kexec: Waking offline cpu 5.
    [ 9464.148708] kexec: Waking offline cpu 6.
    [ 9464.148743] kexec: Waking offline cpu 7.
    [ 9464.148779] kexec: Waking offline cpu 9.
    [ 9464.148815] kexec: Waking offline cpu 10.
    [ 9464.148851] kexec: Waking offline cpu 11.
    [ 9464.148887] kexec: Waking offline cpu 12.
    [ 9464.148922] kexec: Waking offline cpu 13.
    [ 9464.148958] kexec: Waking offline cpu 14.
    [ 9464.148994] kexec: Waking offline cpu 15.
    [ 9464.149030] kexec: Waking offline cpu 17.

    Instrumenting this piece of code revealed that the cpu_up() operation actually
    fails with -EBUSY. Thus, only the primary threads of all the cores are online
    during kexec, and hence this is a sure-shot receipe for disaster, as explained
    in commit e8e5c2155b (powerpc/kexec: Fix orphaned offline CPUs across kexec),
    as well as in the comment above wake_offline_cpus().

    It turns out that cpu_up() was returning -EBUSY because the variable
    'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done
    by migrate_to_reboot_cpu() inside kernel_kexec().

    Now, migrate_to_reboot_cpu() was originally written with the assumption that
    any further code will not need to perform CPU hotplug, since we are anyway in
    the reboot path. However, kexec is clearly not such a case, since we depend on
    onlining CPUs, atleast on powerpc.

    So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the
    kexec path, to fix this regression in kexec on powerpc.

    Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we
    can catch such issues more easily in the future.

    Fixes: c97102ba963 (kexec: migrate to reboot cpu)
    Cc: stable@vger.kernel.org
    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Benjamin Herrenschmidt

    Srivatsa S. Bhat
     

08 Apr, 2014

1 commit


04 Apr, 2014

1 commit

  • Code that is obj-y (always built-in) or dependent on a bool Kconfig
    (built-in or absent) can never be modular. So using module_init as an
    alias for __initcall can be somewhat misleading.

    Fix these up now, so that we can relocate module_init from init.h into
    module.h in the future. If we don't do this, we'd have to add module.h
    to obviously non-modular code, and that would be a worse thing.

    The audit targets the following module_init users for change:
    kernel/user.c obj-y
    kernel/kexec.c bool KEXEC (one instance per arch)
    kernel/profile.c bool PROFILING
    kernel/hung_task.c bool DETECT_HUNG_TASK
    kernel/sched/stats.c bool SCHEDSTATS
    kernel/user_namespace.c bool USER_NS

    Note that direct use of __initcall is discouraged, vs. one of the
    priority categorized subgroups. As __initcall gets mapped onto
    device_initcall, our use of subsys_initcall (which makes sense for these
    files) will thus change this registration from level 6-device to level
    4-subsys (i.e. slightly earlier). However no observable impact of that
    difference has been observed during testing.

    Also, two instances of missing ";" at EOL are fixed in kexec.

    Signed-off-by: Paul Gortmaker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Eric Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Gortmaker
     

06 Mar, 2014

1 commit


28 Jan, 2014

1 commit


24 Jan, 2014

1 commit

  • For general-purpose (i.e. distro) kernel builds it makes sense to build
    with CONFIG_KEXEC to allow end users to choose what kind of things they
    want to do with kexec. However, in the face of trying to lock down a
    system with such a kernel, there needs to be a way to disable kexec_load
    (much like module loading can be disabled). Without this, it is too easy
    for the root user to modify kernel memory even when CONFIG_STRICT_DEVMEM
    and modules_disabled are set. With this change, it is still possible to
    load an image for use later, then disable kexec_load so the image (or lack
    of image) can't be altered.

    The intention is for using this in environments where "perfect"
    enforcement is hard. Without a verified boot, along with verified
    modules, and along with verified kexec, this is trying to give a system a
    better chance to defend itself (or at least grow the window of
    discoverability) against attack in the face of a privilege escalation.

    In my mind, I consider several boot scenarios:

    1) Verified boot of read-only verified root fs loading fd-based
    verification of kexec images.
    2) Secure boot of writable root fs loading signed kexec images.
    3) Regular boot loading kexec (e.g. kcrash) image early and locking it.
    4) Regular boot with no control of kexec image at all.

    1 and 2 don't exist yet, but will soon once the verified kexec series has
    landed. 4 is the state of things now. The gap between 2 and 4 is too
    large, so this change creates scenario 3, a middle-ground above 4 when 2
    and 1 are not possible for a system.

    Signed-off-by: Kees Cook
    Acked-by: Rik van Riel
    Cc: Vivek Goyal
    Cc: Eric Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

19 Dec, 2013

1 commit

  • Commit 1b3a5d02ee07 ("reboot: move arch/x86 reboot= handling to generic
    kernel") moved reboot= handling to generic code. In the process it also
    removed the code in native_machine_shutdown() which are moving reboot
    process to reboot_cpu/cpu0.

    I guess that thought must have been that all reboot paths are calling
    migrate_to_reboot_cpu(), so we don't need this special handling. But
    kexec reboot path (kernel_kexec()) is not calling
    migrate_to_reboot_cpu() so above change broke kexec. Now reboot can
    happen on non-boot cpu and when INIT is sent in second kerneo to bring
    up BP, it brings down the machine.

    So start calling migrate_to_reboot_cpu() in kexec reboot path to avoid
    this problem.

    Bisected by WANG Chao.

    Reported-by: Matthew Whitehead
    Reported-by: Dave Young
    Signed-off-by: Vivek Goyal
    Tested-by: Baoquan He
    Tested-by: WANG Chao
    Acked-by: H. Peter Anvin
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal