17 Oct, 2020

1 commit

  • IORESOURCE_MEM_DRIVER_MANAGED currently uses an unused PnP bit, which is
    always set to 0 by hardware. This is far from beautiful (and confusing),
    and the bit only applies to SYSRAM. So let's move it out of the
    bus-specific (PnP) defined bits.

    We'll add another SYSRAM specific bit soon. If we ever need more bits for
    other purposes, we can steal some from "desc", or reshuffle/regroup what
    we have.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Cc: Michal Hocko
    Cc: Dan Williams
    Cc: Jason Gunthorpe
    Cc: Kees Cook
    Cc: Ard Biesheuvel
    Cc: Pankaj Gupta
    Cc: Baoquan He
    Cc: Wei Yang
    Cc: Eric Biederman
    Cc: Thomas Gleixner
    Cc: Greg Kroah-Hartman
    Cc: Anton Blanchard
    Cc: Benjamin Herrenschmidt
    Cc: Boris Ostrovsky
    Cc: Christian Borntraeger
    Cc: Dave Jiang
    Cc: Haiyang Zhang
    Cc: Heiko Carstens
    Cc: Jason Wang
    Cc: Juergen Gross
    Cc: Julien Grall
    Cc: "K. Y. Srinivasan"
    Cc: Len Brown
    Cc: Leonardo Bras
    Cc: Libor Pechacek
    Cc: Michael Ellerman
    Cc: "Michael S. Tsirkin"
    Cc: Nathan Lynch
    Cc: "Oliver O'Halloran"
    Cc: Paul Mackerras
    Cc: Pingfan Liu
    Cc: "Rafael J. Wysocki"
    Cc: Roger Pau Monné
    Cc: Stefano Stabellini
    Cc: Stephen Hemminger
    Cc: Vasily Gorbik
    Cc: Vishal Verma
    Cc: Wei Liu
    Link: https://lkml.kernel.org/r/20200911103459.10306-3-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     

05 Oct, 2020

4 commits

  • To perform partial reads, callers of kernel_read_file*() must have a
    non-NULL file_size argument and a preallocated buffer. The new "offset"
    argument can then be used to seek to specific locations in the file to
    fill the buffer to, at most, "buf_size" per call.

    Where possible, the LSM hooks can report whether a full file has been
    read or not so that the contents can be reasoned about.

    Signed-off-by: Kees Cook
    Link: https://lore.kernel.org/r/20201002173828.2099543-14-keescook@chromium.org
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     
  • In preparation for adding partial read support, add an optional output
    argument to kernel_read_file*() that reports the file size so callers
    can reason more easily about their reading progress.

    Signed-off-by: Kees Cook
    Reviewed-by: Mimi Zohar
    Reviewed-by: Luis Chamberlain
    Reviewed-by: James Morris
    Acked-by: Scott Branden
    Link: https://lore.kernel.org/r/20201002173828.2099543-8-keescook@chromium.org
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     
  • In preparation for refactoring kernel_read_file*(), remove the redundant
    "size" argument which is not needed: it can be included in the return
    code, with callers adjusted. (VFS reads already cannot be larger than
    INT_MAX.)

    Signed-off-by: Kees Cook
    Reviewed-by: Mimi Zohar
    Reviewed-by: Luis Chamberlain
    Reviewed-by: James Morris
    Acked-by: Scott Branden
    Link: https://lore.kernel.org/r/20201002173828.2099543-6-keescook@chromium.org
    Signed-off-by: Greg Kroah-Hartman

    Kees Cook
     
  • Move kernel_read_file* out of linux/fs.h to its own linux/kernel_read_file.h
    include file. That header gets pulled in just about everywhere
    and doesn't really need functions not related to the general fs interface.

    Suggested-by: Christoph Hellwig
    Signed-off-by: Scott Branden
    Signed-off-by: Kees Cook
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Mimi Zohar
    Reviewed-by: Luis Chamberlain
    Acked-by: Greg Kroah-Hartman
    Acked-by: James Morris
    Link: https://lore.kernel.org/r/20200706232309.12010-2-scott.branden@broadcom.com
    Link: https://lore.kernel.org/r/20201002173828.2099543-4-keescook@chromium.org
    Signed-off-by: Greg Kroah-Hartman

    Scott Branden
     

16 Aug, 2020

1 commit

  • Pull x86 fixes from Ingo Molnar:
    "Misc fixes and small updates all around the place:

    - Fix mitigation state sysfs output

    - Fix an FPU xstate/sxave code assumption bug triggered by
    Architectural LBR support

    - Fix Lightning Mountain SoC TSC frequency enumeration bug

    - Fix kexec debug output

    - Fix kexec memory range assumption bug

    - Fix a boundary condition in the crash kernel code

    - Optimize porgatory.ro generation a bit

    - Enable ACRN guests to use X2APIC mode

    - Reduce a __text_poke() IRQs-off critical section for the benefit of
    PREEMPT_RT"

    * tag 'x86-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/alternatives: Acquire pte lock with interrupts enabled
    x86/bugs/multihit: Fix mitigation reporting when VMX is not in use
    x86/fpu/xstate: Fix an xstate size check warning with architectural LBRs
    x86/purgatory: Don't generate debug info for purgatory.ro
    x86/tsr: Fix tsc frequency enumeration bug on Lightning Mountain SoC
    kexec_file: Correctly output debugging information for the PT_LOAD ELF header
    kexec: Improve & fix crash_exclude_mem_range() to handle overlapping ranges
    x86/crash: Correct the address boundary of function parameters
    x86/acrn: Remove redundant chars from ACRN signature
    x86/acrn: Allow ACRN guest to use X2APIC mode

    Linus Torvalds
     

08 Aug, 2020

1 commit

  • Pull powerpc updates from Michael Ellerman:

    - Add support for (optionally) using queued spinlocks & rwlocks.

    - Support for a new faster system call ABI using the scv instruction on
    Power9 or later.

    - Drop support for the PROT_SAO mmap/mprotect flag as it will be
    unsupported on Power10 and future processors, leaving us with no way
    to implement the functionality it requests. This risks breaking
    userspace, though we believe it is unused in practice.

    - A bug fix for, and then the removal of, our custom stack expansion
    checking. We now allow stack expansion up to the rlimit, like other
    architectures.

    - Remove the remnants of our (previously disabled) topology update
    code, which tried to react to NUMA layout changes on virtualised
    systems, but was prone to crashes and other problems.

    - Add PMU support for Power10 CPUs.

    - A change to our signal trampoline so that we don't unbalance the link
    stack (branch return predictor) in the signal delivery path.

    - Lots of other cleanups, refactorings, smaller features and so on as
    usual.

    Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey
    Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju
    T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan
    S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris
    Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan
    Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn
    Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand,
    Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel
    Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh
    Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan
    Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton
    Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan
    Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran,
    Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud,
    Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
    Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
    Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar
    Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza
    Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov,
    Wei Yongjun, Wen Xiong, YueHaibing.

    * tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits)
    selftests/powerpc: Fix pkey syscall redefinitions
    powerpc: Fix circular dependency between percpu.h and mmu.h
    powerpc/powernv/sriov: Fix use of uninitialised variable
    selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs
    powerpc/40x: Fix assembler warning about r0
    powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
    powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
    cpuidle: pseries: Fixup exit latency for CEDE(0)
    cpuidle: pseries: Add function to parse extended CEDE records
    cpuidle: pseries: Set the latency-hint before entering CEDE
    selftests/powerpc: Fix online CPU selection
    powerpc/perf: Consolidate perf_callchain_user_[64|32]()
    powerpc/pseries/hotplug-cpu: Remove double free in error path
    powerpc/pseries/mobility: Add pr_debug() for device tree changes
    powerpc/pseries/mobility: Set pr_fmt()
    powerpc/cacheinfo: Warn if cache object chain becomes unordered
    powerpc/cacheinfo: Improve diagnostics about malformed cache lists
    powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages
    powerpc/cacheinfo: Set pr_fmt()
    powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
    ...

    Linus Torvalds
     

07 Aug, 2020

3 commits

  • Currently, when we enable the debugging switch to debug kexec_file,
    we always get the following incorrect results:

    kexec_file: Crash PT_LOAD elf header. phdr=00000000c988639b vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=51 p_offset=0x0
    kexec_file: Crash PT_LOAD elf header. phdr=000000003cca69a0 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=52 p_offset=0x0
    kexec_file: Crash PT_LOAD elf header. phdr=00000000c584cb9f vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=53 p_offset=0x0
    kexec_file: Crash PT_LOAD elf header. phdr=00000000cf85d57f vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=54 p_offset=0x0
    kexec_file: Crash PT_LOAD elf header. phdr=00000000a4a8f847 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=55 p_offset=0x0
    kexec_file: Crash PT_LOAD elf header. phdr=00000000272ec49f vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=56 p_offset=0x0
    kexec_file: Crash PT_LOAD elf header. phdr=00000000ea0b65de vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=57 p_offset=0x0
    kexec_file: Crash PT_LOAD elf header. phdr=000000001f5e490c vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=58 p_offset=0x0
    kexec_file: Crash PT_LOAD elf header. phdr=00000000dfe4109e vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=59 p_offset=0x0
    kexec_file: Crash PT_LOAD elf header. phdr=00000000480ed2b6 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=60 p_offset=0x0
    kexec_file: Crash PT_LOAD elf header. phdr=0000000080b65151 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=61 p_offset=0x0
    kexec_file: Crash PT_LOAD elf header. phdr=0000000024e31c5e vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=62 p_offset=0x0
    kexec_file: Crash PT_LOAD elf header. phdr=00000000332e0385 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=63 p_offset=0x0
    kexec_file: Crash PT_LOAD elf header. phdr=000000002754d5da vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=64 p_offset=0x0
    kexec_file: Crash PT_LOAD elf header. phdr=00000000783320dd vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=65 p_offset=0x0
    kexec_file: Crash PT_LOAD elf header. phdr=0000000076fe5b64 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=66 p_offset=0x0

    The reason is that kernel always prints the values of the next PT_LOAD
    instead of the current PT_LOAD. Change it to ensure that we can get the
    correct debugging information.

    [ mingo: Amended changelog, capitalized "ELF". ]

    Signed-off-by: Lianbo Jiang
    Signed-off-by: Ingo Molnar
    Acked-by: Dave Young
    Link: https://lore.kernel.org/r/20200804044933.1973-4-lijiang@redhat.com

    Lianbo Jiang
     
  • The crash_exclude_mem_range() function can only handle one memory region a time.

    It will fail in the case in which the passed in area covers several memory
    regions. In this case, it will only exclude the first region, then return,
    but leave the later regions unsolved.

    E.g in a NEC system with two usable RAM regions inside the low 1M:

    ...
    BIOS-e820: [mem 0x0000000000000000-0x000000000003efff] usable
    BIOS-e820: [mem 0x000000000003f000-0x000000000003ffff] reserved
    BIOS-e820: [mem 0x0000000000040000-0x000000000009ffff] usable

    It will only exclude the memory region [0, 0x3efff], the memory region
    [0x40000, 0x9ffff] will still be added into /proc/vmcore, which may cause
    the following failure when dumping vmcore:

    ioremap on RAM at 0x0000000000040000 - 0x0000000000040fff
    WARNING: CPU: 0 PID: 665 at arch/x86/mm/ioremap.c:186 __ioremap_caller+0x2c7/0x2e0
    ...
    RIP: 0010:__ioremap_caller+0x2c7/0x2e0
    ...
    cp: error reading '/proc/vmcore': Cannot allocate memory
    kdump: saving vmcore failed

    In order to fix this bug, let's extend the crash_exclude_mem_range()
    to handle the overlapping ranges.

    [ mingo: Amended the changelog. ]

    Signed-off-by: Lianbo Jiang
    Signed-off-by: Ingo Molnar
    Acked-by: Dave Young
    Link: https://lore.kernel.org/r/20200804044933.1973-3-lijiang@redhat.com

    Lianbo Jiang
     
  • Pull integrity updates from Mimi Zohar:
    "The nicest change is the IMA policy rule checking. The other changes
    include allowing the kexec boot cmdline line measure policy rules to
    be defined in terms of the inode associated with the kexec kernel
    image, making the IMA_APPRAISE_BOOTPARAM, which governs the IMA
    appraise mode (log, fix, enforce), a runtime decision based on the
    secure boot mode of the system, and including errno in the audit log"

    * tag 'integrity-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
    integrity: remove redundant initialization of variable ret
    ima: move APPRAISE_BOOTPARAM dependency on ARCH_POLICY to runtime
    ima: AppArmor satisfies the audit rule requirements
    ima: Rename internal filter rule functions
    ima: Support additional conditionals in the KEXEC_CMDLINE hook function
    ima: Use the common function to detect LSM conditionals in a rule
    ima: Move comprehensive rule validation checks out of the token parser
    ima: Use correct type for the args_p member of ima_rule_entry.lsm elements
    ima: Shallow copy the args_p member of ima_rule_entry.lsm elements
    ima: Fail rule parsing when appraise_flag=blacklist is unsupportable
    ima: Fail rule parsing when the KEY_CHECK hook is combined with an invalid cond
    ima: Fail rule parsing when the KEXEC_CMDLINE hook is combined with an invalid cond
    ima: Fail rule parsing when buffer hook functions have an invalid action
    ima: Free the entire rule if it fails to parse
    ima: Free the entire rule when deleting a list of rules
    ima: Have the LSM free its audit rule
    IMA: Add audit log for failure conditions
    integrity: Add errno field in audit message

    Linus Torvalds
     

29 Jul, 2020

1 commit

  • Some architectures may have special memory regions, within the given
    memory range, which can't be used for the buffer in a kexec segment.
    Implement weak arch_kexec_locate_mem_hole() definition which arch code
    may override, to take care of special regions, while trying to locate
    a memory hole.

    Also, add the missing declarations for arch overridable functions and
    and drop the __weak descriptors in the declarations to avoid non-weak
    definitions from becoming weak.

    Signed-off-by: Hari Bathini
    Tested-by: Pingfan Liu
    Reviewed-by: Thiago Jung Bauermann
    Acked-by: Dave Young
    Signed-off-by: Michael Ellerman
    Link: https://lore.kernel.org/r/159602273603.575379.17665852963340380839.stgit@hbathini

    Hari Bathini
     

21 Jul, 2020

1 commit

  • Take the properties of the kexec kernel's inode and the current task
    ownership into consideration when matching a KEXEC_CMDLINE operation to
    the rules in the IMA policy. This allows for some uniformity when
    writing IMA policy rules for KEXEC_KERNEL_CHECK, KEXEC_INITRAMFS_CHECK,
    and KEXEC_CMDLINE operations.

    Prior to this patch, it was not possible to write a set of rules like
    this:

    dont_measure func=KEXEC_KERNEL_CHECK obj_type=foo_t
    dont_measure func=KEXEC_INITRAMFS_CHECK obj_type=foo_t
    dont_measure func=KEXEC_CMDLINE obj_type=foo_t
    measure func=KEXEC_KERNEL_CHECK
    measure func=KEXEC_INITRAMFS_CHECK
    measure func=KEXEC_CMDLINE

    The inode information associated with the kernel being loaded by a
    kexec_kernel_load(2) syscall can now be included in the decision to
    measure or not

    Additonally, the uid, euid, and subj_* conditionals can also now be
    used in KEXEC_CMDLINE rules. There was no technical reason as to why
    those conditionals weren't being considered previously other than
    ima_match_rules() didn't have a valid inode to use so it immediately
    bailed out for KEXEC_CMDLINE operations rather than going through the
    full list of conditional comparisons.

    Signed-off-by: Tyler Hicks
    Cc: Eric Biederman
    Cc: kexec@lists.infradead.org
    Reviewed-by: Lakshmi Ramasubramanian
    Signed-off-by: Mimi Zohar

    Tyler Hicks
     

26 Jun, 2020

1 commit

  • Signature verification is an important security feature, to protect
    system from being attacked with a kernel of unknown origin. Kexec
    rebooting is a way to replace the running kernel, hence need be secured
    carefully.

    In the current code of handling signature verification of kexec kernel,
    the logic is very twisted. It mixes signature verification, IMA
    signature appraising and kexec lockdown.

    If there is no KEXEC_SIG_FORCE, kexec kernel image doesn't have one of
    signature, the supported crypto, and key, we don't think this is wrong,
    Unless kexec lockdown is executed. IMA is considered as another kind of
    signature appraising method.

    If kexec kernel image has signature/crypto/key, it has to go through the
    signature verification and pass. Otherwise it's seen as verification
    failure, and won't be loaded.

    Seems kexec kernel image with an unqualified signature is even worse
    than those w/o signature at all, this sounds very unreasonable. E.g.
    If people get a unsigned kernel to load, or a kernel signed with expired
    key, which one is more dangerous?

    So, here, let's simplify the logic to improve code readability. If the
    KEXEC_SIG_FORCE enabled or kexec lockdown enabled, signature
    verification is mandated. Otherwise, we lift the bar for any kernel
    image.

    Link: http://lkml.kernel.org/r/20200602045952.27487-1-lijiang@redhat.com
    Signed-off-by: Lianbo Jiang
    Reviewed-by: Jiri Bohac
    Acked-by: Dave Young
    Acked-by: Baoquan He
    Cc: James Morris
    Cc: Matthew Garrett
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lianbo Jiang
     

05 Jun, 2020

1 commit

  • Memory flagged with IORESOURCE_MEM_DRIVER_MANAGED is special - it won't be
    part of the initial memmap of the kexec kernel and not all memory might be
    accessible. Don't place any kexec images onto it.

    Signed-off-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Cc: Michal Hocko
    Cc: Pankaj Gupta
    Cc: Wei Yang
    Cc: Baoquan He
    Cc: Dave Hansen
    Cc: Eric Biederman
    Cc: Pavel Tatashin
    Cc: Dan Williams
    Link: http://lkml.kernel.org/r/20200508084217.9160-4-david@redhat.com
    Signed-off-by: Linus Torvalds

    David Hildenbrand
     

09 Jan, 2020

1 commit

  • It is the same as machine_kexec_prepare(), but is called after segments are
    loaded. This way, can do processing work with already loaded relocation
    segments. One such example is arm64: it has to have segments loaded in
    order to create a page table, but it cannot do it during kexec time,
    because at that time allocations won't be possible anymore.

    Signed-off-by: Pavel Tatashin
    Acked-by: Dave Young
    Signed-off-by: Will Deacon

    Pavel Tatashin
     

02 Nov, 2019

1 commit

  • Fix two pointer-to-int-cast warnings when compiling for the 32-bit parisc
    platform:

    kernel/kexec_file.c: In function ‘crash_prepare_elf64_headers’:
    kernel/kexec_file.c:1307:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
    phdr->p_vaddr = (Elf64_Addr)_text;
    ^
    kernel/kexec_file.c:1324:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
    phdr->p_vaddr = (unsigned long long) __va(mstart);
    ^

    Signed-off-by: Helge Deller

    Helge Deller
     

28 Sep, 2019

1 commit

  • Pull kernel lockdown mode from James Morris:
    "This is the latest iteration of the kernel lockdown patchset, from
    Matthew Garrett, David Howells and others.

    From the original description:

    This patchset introduces an optional kernel lockdown feature,
    intended to strengthen the boundary between UID 0 and the kernel.
    When enabled, various pieces of kernel functionality are restricted.
    Applications that rely on low-level access to either hardware or the
    kernel may cease working as a result - therefore this should not be
    enabled without appropriate evaluation beforehand.

    The majority of mainstream distributions have been carrying variants
    of this patchset for many years now, so there's value in providing a
    doesn't meet every distribution requirement, but gets us much closer
    to not requiring external patches.

    There are two major changes since this was last proposed for mainline:

    - Separating lockdown from EFI secure boot. Background discussion is
    covered here: https://lwn.net/Articles/751061/

    - Implementation as an LSM, with a default stackable lockdown LSM
    module. This allows the lockdown feature to be policy-driven,
    rather than encoding an implicit policy within the mechanism.

    The new locked_down LSM hook is provided to allow LSMs to make a
    policy decision around whether kernel functionality that would allow
    tampering with or examining the runtime state of the kernel should be
    permitted.

    The included lockdown LSM provides an implementation with a simple
    policy intended for general purpose use. This policy provides a coarse
    level of granularity, controllable via the kernel command line:

    lockdown={integrity|confidentiality}

    Enable the kernel lockdown feature. If set to integrity, kernel features
    that allow userland to modify the running kernel are disabled. If set to
    confidentiality, kernel features that allow userland to extract
    confidential information from the kernel are also disabled.

    This may also be controlled via /sys/kernel/security/lockdown and
    overriden by kernel configuration.

    New or existing LSMs may implement finer-grained controls of the
    lockdown features. Refer to the lockdown_reason documentation in
    include/linux/security.h for details.

    The lockdown feature has had signficant design feedback and review
    across many subsystems. This code has been in linux-next for some
    weeks, with a few fixes applied along the way.

    Stephen Rothwell noted that commit 9d1f8be5cf42 ("bpf: Restrict bpf
    when kernel lockdown is in confidentiality mode") is missing a
    Signed-off-by from its author. Matthew responded that he is providing
    this under category (c) of the DCO"

    * 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
    kexec: Fix file verification on S390
    security: constify some arrays in lockdown LSM
    lockdown: Print current->comm in restriction messages
    efi: Restrict efivar_ssdt_load when the kernel is locked down
    tracefs: Restrict tracefs when the kernel is locked down
    debugfs: Restrict debugfs when the kernel is locked down
    kexec: Allow kexec_file() with appropriate IMA policy when locked down
    lockdown: Lock down perf when in confidentiality mode
    bpf: Restrict bpf when kernel lockdown is in confidentiality mode
    lockdown: Lock down tracing and perf kprobes when in confidentiality mode
    lockdown: Lock down /proc/kcore
    x86/mmiotrace: Lock down the testmmiotrace module
    lockdown: Lock down module params that specify hardware parameters (eg. ioport)
    lockdown: Lock down TIOCSSERIAL
    lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
    acpi: Disable ACPI table override if the kernel is locked down
    acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
    ACPI: Limit access to custom_method when the kernel is locked down
    x86/msr: Restrict MSR access when the kernel is locked down
    x86: Lock down IO port access when the kernel is locked down
    ...

    Linus Torvalds
     

20 Aug, 2019

3 commits

  • Systems in lockdown mode should block the kexec of untrusted kernels.
    For x86 and ARM we can ensure that a kernel is trustworthy by validating
    a PE signature, but this isn't possible on other architectures. On those
    platforms we can use IMA digital signatures instead. Add a function to
    determine whether IMA has or will verify signatures for a given event type,
    and if so permit kexec_file() even if the kernel is otherwise locked down.
    This is restricted to cases where CONFIG_INTEGRITY_TRUSTED_KEYRING is set
    in order to prevent an attacker from loading additional keys at runtime.

    Signed-off-by: Matthew Garrett
    Acked-by: Mimi Zohar
    Cc: Dmitry Kasatkin
    Cc: linux-integrity@vger.kernel.org
    Signed-off-by: James Morris

    Matthew Garrett
     
  • When KEXEC_SIG is not enabled, kernel should not load images through
    kexec_file systemcall if the kernel is locked down.

    [Modified by David Howells to fit with modifications to the previous patch
    and to return -EPERM if the kernel is locked down for consistency with
    other lockdowns. Modified by Matthew Garrett to remove the IMA
    integration, which will be replaced by integrating with the IMA
    architecture policy patches.]

    Signed-off-by: Jiri Bohac
    Signed-off-by: David Howells
    Signed-off-by: Matthew Garrett
    cc: kexec@lists.infradead.org
    Signed-off-by: James Morris

    Jiri Bohac
     
  • This is a preparatory patch for kexec_file_load() lockdown. A locked down
    kernel needs to prevent unsigned kernel images from being loaded with
    kexec_file_load(). Currently, the only way to force the signature
    verification is compiling with KEXEC_VERIFY_SIG. This prevents loading
    usigned images even when the kernel is not locked down at runtime.

    This patch splits KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE.
    Analogous to the MODULE_SIG and MODULE_SIG_FORCE for modules, KEXEC_SIG
    turns on the signature verification but allows unsigned images to be
    loaded. KEXEC_SIG_FORCE disallows images without a valid signature.

    Signed-off-by: Jiri Bohac
    Signed-off-by: David Howells
    Signed-off-by: Matthew Garrett
    cc: kexec@lists.infradead.org
    Signed-off-by: James Morris

    Jiri Bohac
     

09 Jul, 2019

1 commit

  • Pull integrity updates from Mimi Zohar:
    "Bug fixes, code clean up, and new features:

    - IMA policy rules can be defined in terms of LSM labels, making the
    IMA policy dependent on LSM policy label changes, in particular LSM
    label deletions. The new environment, in which IMA-appraisal is
    being used, frequently updates the LSM policy and permits LSM label
    deletions.

    - Prevent an mmap'ed shared file opened for write from also being
    mmap'ed execute. In the long term, making this and other similar
    changes at the VFS layer would be preferable.

    - The IMA per policy rule template format support is needed for a
    couple of new/proposed features (eg. kexec boot command line
    measurement, appended signatures, and VFS provided file hashes).

    - Other than the "boot-aggregate" record in the IMA measuremeent
    list, all other measurements are of file data. Measuring and
    storing the kexec boot command line in the IMA measurement list is
    the first buffer based measurement included in the measurement
    list"

    * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
    integrity: Introduce struct evm_xattr
    ima: Update MAX_TEMPLATE_NAME_LEN to fit largest reasonable definition
    KEXEC: Call ima_kexec_cmdline to measure the boot command line args
    IMA: Define a new template field buf
    IMA: Define a new hook to measure the kexec boot command line arguments
    IMA: support for per policy rule template formats
    integrity: Fix __integrity_init_keyring() section mismatch
    ima: Use designated initializers for struct ima_event_data
    ima: use the lsm policy update notifier
    LSM: switch to blocking policy update notifiers
    x86/ima: fix the Kconfig dependency for IMA_ARCH_POLICY
    ima: Make arch_policy_entry static
    ima: prevent a file already mmap'ed write to be mmap'ed execute
    x86/ima: check EFI SetupMode too

    Linus Torvalds
     

01 Jul, 2019

1 commit

  • During soft reboot(kexec_file_load) boot command line
    arguments are not measured.

    Call ima hook ima_kexec_cmdline to measure the boot command line
    arguments into IMA measurement list.

    - call ima_kexec_cmdline from kexec_file_load.
    - move the call ima_add_kexec_buffer after the cmdline
    args have been measured.

    Signed-off-by: Prakhar Srivastava
    Reviewed-by: James Morris
    Acked-by: Dave Young
    Signed-off-by: Mimi Zohar

    Prakhar Srivastava
     

19 Jun, 2019

1 commit

  • Based on 2 normalized pattern(s):

    this source code is licensed under the gnu general public license
    version 2 see the file copying for more details

    this source code is licensed under general public license version 2
    see

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 52 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Enrico Weigelt
    Reviewed-by: Allison Randal
    Reviewed-by: Alexios Zavras
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190602204653.449021192@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

15 May, 2019

1 commit

  • Most architectures do not need the memblock memory after the page
    allocator is initialized, but only few enable ARCH_DISCARD_MEMBLOCK in the
    arch Kconfig.

    Replacing ARCH_DISCARD_MEMBLOCK with ARCH_KEEP_MEMBLOCK and inverting the
    logic makes it clear which architectures actually use memblock after
    system initialization and skips the necessity to add ARCH_DISCARD_MEMBLOCK
    to the architectures that are still missing that option.

    Link: http://lkml.kernel.org/r/1556102150-32517-1-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Michael Ellerman (powerpc)
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Richard Kuo
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Geert Uytterhoeven
    Cc: Ralf Baechle
    Cc: Paul Burton
    Cc: James Hogan
    Cc: Ley Foon Tan
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Yoshinori Sato
    Cc: Rich Felker
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Eric Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

25 Apr, 2019

1 commit

  • The flags field in 'struct shash_desc' never actually does anything.
    The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP.
    However, no shash algorithm ever sleeps, making this flag a no-op.

    With this being the case, inevitably some users who can't sleep wrongly
    pass MAY_SLEEP. These would all need to be fixed if any shash algorithm
    actually started sleeping. For example, the shash_ahash_*() functions,
    which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP
    from the ahash API to the shash API. However, the shash functions are
    called under kmap_atomic(), so actually they're assumed to never sleep.

    Even if it turns out that some users do need preemption points while
    hashing large buffers, we could easily provide a helper function
    crypto_shash_update_large() which divides the data into smaller chunks
    and calls crypto_shash_update() and cond_resched() for each chunk. It's
    not necessary to have a flag in 'struct shash_desc', nor is it necessary
    to make individual shash algorithms aware of this at all.

    Therefore, remove shash_desc::flags, and document that the
    crypto_shash_*() functions can be called from any context.

    Signed-off-by: Eric Biggers
    Signed-off-by: Herbert Xu

    Eric Biggers
     

06 Dec, 2018

4 commits

  • In kdump case, there exists only one dedicated memblock region as usable
    memory (crashk_res). With this patch, kexec_walk_memblock() runs a given
    callback function on this region.

    Cosmetic change: 0 to MEMBLOCK_NONE at for_each_free_mem_range*()

    Signed-off-by: AKASHI Takahiro
    Acked-by: Dave Young
    Cc: Vivek Goyal
    Cc: Baoquan He
    Signed-off-by: Will Deacon

    AKASHI Takahiro
     
  • Memblock list is another source for usable system memory layout.
    So move powerpc's arch_kexec_walk_mem() to common code so that other
    memblock-based architectures, particularly arm64, can also utilise it.
    A moved function is now renamed to kexec_walk_memblock() and integrated
    into kexec_locate_mem_hole(), which will now be usable for all
    architectures with no need for overriding arch_kexec_walk_mem().

    With this change, arch_kexec_walk_mem() need no longer be a weak function,
    and was now renamed to kexec_walk_resources().

    Since powerpc doesn't support kdump in its kexec_file_load(), the current
    kexec_walk_memblock() won't work for kdump either in this form, this will
    be fixed in the next patch.

    Signed-off-by: AKASHI Takahiro
    Cc: "Eric W. Biederman"
    Acked-by: Dave Young
    Cc: Vivek Goyal
    Cc: Baoquan He
    Acked-by: James Morse
    Signed-off-by: Will Deacon

    AKASHI Takahiro
     
  • Since s390 already knows where to locate buffers, calling
    arch_kexec_mem_walk() has no sense. So we can just drop it as kbuf->mem
    indicates this while all other architectures sets it to 0 initially.

    This change is a preparatory work for the next patch, where all the
    variant memory walks, either on system resource or memblock, will be
    put in one common place so that it will satisfy all the architectures'
    need.

    Signed-off-by: AKASHI Takahiro
    Reviewed-by: Philipp Rudo
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Dave Young
    Cc: Vivek Goyal
    Cc: Baoquan He
    Signed-off-by: Will Deacon

    AKASHI Takahiro
     
  • Change this function from static to global so that arm64 can implement
    its own arch_kimage_file_post_load_cleanup() later using
    kexec_image_post_load_cleanup_default().

    Signed-off-by: AKASHI Takahiro
    Acked-by: Dave Young
    Cc: Vivek Goyal
    Cc: Baoquan He
    Signed-off-by: Will Deacon

    AKASHI Takahiro
     

04 Nov, 2018

1 commit

  • We include kexec.h and slab.h twice in kexec_file.c. It's unnecessary.
    hence just remove them.

    Link: http://lkml.kernel.org/r/1537498098-19171-1-git-send-email-zhongjiang@huawei.com
    Signed-off-by: zhong jiang
    Reviewed-by: Bhupesh Sharma
    Reviewed-by: Andrew Morton
    Acked-by: Baoquan He
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    zhong jiang
     

13 Jun, 2018

1 commit

  • The vzalloc() function has no 2-factor argument form, so multiplication
    factors need to be wrapped in array_size(). This patch replaces cases of:

    vzalloc(a * b)

    with:
    vzalloc(array_size(a, b))

    as well as handling cases of:

    vzalloc(a * b * c)

    with:

    vzalloc(array3_size(a, b, c))

    This does, however, attempt to ignore constant size factors like:

    vzalloc(4 * 1024)

    though any constants defined via macros get caught up in the conversion.

    Any factors with a sizeof() of "unsigned char", "char", and "u8" were
    dropped, since they're redundant.

    The Coccinelle script used for this was:

    // Fix redundant parens around sizeof().
    @@
    type TYPE;
    expression THING, E;
    @@

    (
    vzalloc(
    - (sizeof(TYPE)) * E
    + sizeof(TYPE) * E
    , ...)
    |
    vzalloc(
    - (sizeof(THING)) * E
    + sizeof(THING) * E
    , ...)
    )

    // Drop single-byte sizes and redundant parens.
    @@
    expression COUNT;
    typedef u8;
    typedef __u8;
    @@

    (
    vzalloc(
    - sizeof(u8) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(__u8) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(char) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(unsigned char) * (COUNT)
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(u8) * COUNT
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(__u8) * COUNT
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(char) * COUNT
    + COUNT
    , ...)
    |
    vzalloc(
    - sizeof(unsigned char) * COUNT
    + COUNT
    , ...)
    )

    // 2-factor product with sizeof(type/expression) and identifier or constant.
    @@
    type TYPE;
    expression THING;
    identifier COUNT_ID;
    constant COUNT_CONST;
    @@

    (
    vzalloc(
    - sizeof(TYPE) * (COUNT_ID)
    + array_size(COUNT_ID, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * COUNT_ID
    + array_size(COUNT_ID, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * (COUNT_CONST)
    + array_size(COUNT_CONST, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * COUNT_CONST
    + array_size(COUNT_CONST, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * (COUNT_ID)
    + array_size(COUNT_ID, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * COUNT_ID
    + array_size(COUNT_ID, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * (COUNT_CONST)
    + array_size(COUNT_CONST, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * COUNT_CONST
    + array_size(COUNT_CONST, sizeof(THING))
    , ...)
    )

    // 2-factor product, only identifiers.
    @@
    identifier SIZE, COUNT;
    @@

    vzalloc(
    - SIZE * COUNT
    + array_size(COUNT, SIZE)
    , ...)

    // 3-factor product with 1 sizeof(type) or sizeof(expression), with
    // redundant parens removed.
    @@
    expression THING;
    identifier STRIDE, COUNT;
    type TYPE;
    @@

    (
    vzalloc(
    - sizeof(TYPE) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(TYPE))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * (COUNT) * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * (COUNT) * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * COUNT * (STRIDE)
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    |
    vzalloc(
    - sizeof(THING) * COUNT * STRIDE
    + array3_size(COUNT, STRIDE, sizeof(THING))
    , ...)
    )

    // 3-factor product with 2 sizeof(variable), with redundant parens removed.
    @@
    expression THING1, THING2;
    identifier COUNT;
    type TYPE1, TYPE2;
    @@

    (
    vzalloc(
    - sizeof(TYPE1) * sizeof(TYPE2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
    , ...)
    |
    vzalloc(
    - sizeof(THING1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    vzalloc(
    - sizeof(THING1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(THING1), sizeof(THING2))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * COUNT
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    |
    vzalloc(
    - sizeof(TYPE1) * sizeof(THING2) * (COUNT)
    + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
    , ...)
    )

    // 3-factor product, only identifiers, with redundant parens removed.
    @@
    identifier STRIDE, SIZE, COUNT;
    @@

    (
    vzalloc(
    - (COUNT) * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - COUNT * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - COUNT * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - (COUNT) * (STRIDE) * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - COUNT * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - (COUNT) * STRIDE * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - (COUNT) * (STRIDE) * (SIZE)
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    |
    vzalloc(
    - COUNT * STRIDE * SIZE
    + array3_size(COUNT, STRIDE, SIZE)
    , ...)
    )

    // Any remaining multi-factor products, first at least 3-factor products
    // when they're not all constants...
    @@
    expression E1, E2, E3;
    constant C1, C2, C3;
    @@

    (
    vzalloc(C1 * C2 * C3, ...)
    |
    vzalloc(
    - E1 * E2 * E3
    + array3_size(E1, E2, E3)
    , ...)
    )

    // And then all remaining 2 factors products when they're not all constants.
    @@
    expression E1, E2;
    constant C1, C2;
    @@

    (
    vzalloc(C1 * C2, ...)
    |
    vzalloc(
    - E1 * E2
    + array_size(E1, E2)
    , ...)
    )

    Signed-off-by: Kees Cook

    Kees Cook
     

14 Apr, 2018

9 commits

  • For s390 new kernels are loaded to fixed addresses in memory before they
    are booted. With the current code this is a problem as it assumes the
    kernel will be loaded to an 'arbitrary' address. In particular,
    kexec_locate_mem_hole searches for a large enough memory region and sets
    the load address (kexec_bufer->mem) to it.

    Luckily there is a simple workaround for this problem. By returning 1
    in arch_kexec_walk_mem, kexec_locate_mem_hole is turned off. This
    allows the architecture to set kbuf->mem by hand. While the trick works
    fine for the kernel it does not for the purgatory as here the
    architectures don't have access to its kexec_buffer.

    Give architectures access to the purgatories kexec_buffer by changing
    kexec_load_purgatory to take a pointer to it. With this change
    architectures have access to the buffer and can edit it as they need.

    A nice side effect of this change is that we can get rid of the
    purgatory_info->purgatory_load_address field. As now the information
    stored there can directly be accessed from kbuf->mem.

    Link: http://lkml.kernel.org/r/20180321112751.22196-11-prudo@linux.vnet.ibm.com
    Signed-off-by: Philipp Rudo
    Reviewed-by: Martin Schwidefsky
    Acked-by: Dave Young
    Cc: AKASHI Takahiro
    Cc: Eric Biederman
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Michael Ellerman
    Cc: Thiago Jung Bauermann
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Philipp Rudo
     
  • The current code uses the sh_offset field in purgatory_info->sechdrs to
    store a pointer to the current load address of the section. Depending
    whether the section will be loaded or not this is either a pointer into
    purgatory_info->purgatory_buf or kexec_purgatory. This is not only a
    violation of the ELF standard but also makes the code very hard to
    understand as you cannot tell if the memory you are using is read-only
    or not.

    Remove this misuse and store the offset of the section in
    pugaroty_info->purgatory_buf in sh_offset.

    Link: http://lkml.kernel.org/r/20180321112751.22196-10-prudo@linux.vnet.ibm.com
    Signed-off-by: Philipp Rudo
    Acked-by: Dave Young
    Cc: AKASHI Takahiro
    Cc: Eric Biederman
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Thiago Jung Bauermann
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Philipp Rudo
     
  • The main loop currently uses quite a lot of variables to update the
    section headers. Some of them are unnecessary. So clean them up a
    little.

    Link: http://lkml.kernel.org/r/20180321112751.22196-9-prudo@linux.vnet.ibm.com
    Signed-off-by: Philipp Rudo
    Acked-by: Dave Young
    Cc: AKASHI Takahiro
    Cc: Eric Biederman
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Thiago Jung Bauermann
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Philipp Rudo
     
  • To update the entry point there is an extra loop over all section
    headers although this can be done in the main loop. So move it there
    and eliminate the extra loop and variable to store the 'entry section
    index'.

    Also, in the main loop, move the usual case, i.e. non-bss section, out
    of the extra if-block.

    Link: http://lkml.kernel.org/r/20180321112751.22196-8-prudo@linux.vnet.ibm.com
    Signed-off-by: Philipp Rudo
    Reviewed-by: Martin Schwidefsky
    Acked-by: Dave Young
    Cc: AKASHI Takahiro
    Cc: Eric Biederman
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Michael Ellerman
    Cc: Thiago Jung Bauermann
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Philipp Rudo
     
  • When inspecting __kexec_load_purgatory you find that it has two tasks

    1) setting up the kexec_buffer for the new kernel and,
    2) setting up pi->sechdrs for the final load address.

    The two tasks are independent of each other. To improve readability
    split up __kexec_load_purgatory into two functions, one for each task,
    and call them directly from kexec_load_purgatory.

    Link: http://lkml.kernel.org/r/20180321112751.22196-7-prudo@linux.vnet.ibm.com
    Signed-off-by: Philipp Rudo
    Acked-by: Dave Young
    Cc: AKASHI Takahiro
    Cc: Eric Biederman
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Thiago Jung Bauermann
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Philipp Rudo
     
  • When the relocations are applied to the purgatory only the section the
    relocations are applied to is writable. The other sections, i.e. the
    symtab and .rel/.rela, are in read-only kexec_purgatory. Highlight this
    by marking the corresponding variables as 'const'.

    While at it also change the signatures of arch_kexec_apply_relocations* to
    take section pointers instead of just the index of the relocation section.
    This removes the second lookup and sanity check of the sections in arch
    code.

    Link: http://lkml.kernel.org/r/20180321112751.22196-6-prudo@linux.vnet.ibm.com
    Signed-off-by: Philipp Rudo
    Acked-by: Dave Young
    Cc: AKASHI Takahiro
    Cc: Eric Biederman
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Thiago Jung Bauermann
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Philipp Rudo
     
  • The stripped purgatory does not contain a symtab. So when looking for
    symbols this is done in read-only kexec_purgatory. Highlight this by
    marking the corresponding variables as 'const'.

    Link: http://lkml.kernel.org/r/20180321112751.22196-5-prudo@linux.vnet.ibm.com
    Signed-off-by: Philipp Rudo
    Acked-by: Dave Young
    Cc: AKASHI Takahiro
    Cc: Eric Biederman
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Thiago Jung Bauermann
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Philipp Rudo
     
  • The kexec_purgatory buffer is read-only. Thus all pointers into
    kexec_purgatory are read-only, too. Point this out by explicitly
    marking purgatory_info->ehdr as 'const' and update the comments in
    purgatory_info.

    Link: http://lkml.kernel.org/r/20180321112751.22196-4-prudo@linux.vnet.ibm.com
    Signed-off-by: Philipp Rudo
    Acked-by: Dave Young
    Cc: AKASHI Takahiro
    Cc: Eric Biederman
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Thiago Jung Bauermann
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Philipp Rudo
     
  • Before the purgatory is loaded several checks are done whether the ELF
    file in kexec_purgatory is valid or not. These checks are incomplete.
    For example they don't check for the total size of the sections defined
    in the section header table or if the entry point actually points into
    the purgatory.

    On the other hand the purgatory, although an ELF file on its own, is
    part of the kernel. Thus not trusting the purgatory means not trusting
    the kernel build itself.

    So remove all validity checks on the purgatory and just trust the kernel
    build.

    Link: http://lkml.kernel.org/r/20180321112751.22196-3-prudo@linux.vnet.ibm.com
    Signed-off-by: Philipp Rudo
    Acked-by: Dave Young
    Cc: AKASHI Takahiro
    Cc: Eric Biederman
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Thiago Jung Bauermann
    Cc: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Philipp Rudo