23 Jan, 2020

1 commit

  • commit 8e57f8acbbd121ecfb0c9dc13b8b030f86c6bd3b upstream.

    Commit 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable
    debugging") has introduced a static key to reduce overhead when
    debug_pagealloc is compiled in but not enabled. It relied on the
    assumption that jump_label_init() is called before parse_early_param()
    as in start_kernel(), so when the "debug_pagealloc=on" option is parsed,
    it is safe to enable the static key.

    However, it turns out multiple architectures call parse_early_param()
    earlier from their setup_arch(). x86 also calls jump_label_init() even
    earlier, so no issue was found while testing the commit, but same is not
    true for e.g. ppc64 and s390 where the kernel would not boot with
    debug_pagealloc=on as found by our QA.

    To fix this without tricky changes to init code of multiple
    architectures, this patch partially reverts the static key conversion
    from 96a2b03f281d. Init-time and non-fastpath calls (such as in arch
    code) of debug_pagealloc_enabled() will again test a simple bool
    variable. Fastpath mm code is converted to a new
    debug_pagealloc_enabled_static() variant that relies on the static key,
    which is enabled in a well-defined point in mm_init() where it's
    guaranteed that jump_label_init() has been called, regardless of
    architecture.

    [sfr@canb.auug.org.au: export _debug_pagealloc_enabled_early]
    Link: http://lkml.kernel.org/r/20200106164944.063ac07b@canb.auug.org.au
    Link: http://lkml.kernel.org/r/20191219130612.23171-1-vbabka@suse.cz
    Fixes: 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging")
    Signed-off-by: Vlastimil Babka
    Signed-off-by: Stephen Rothwell
    Cc: Joonsoo Kim
    Cc: "Kirill A. Shutemov"
    Cc: Michal Hocko
    Cc: Vlastimil Babka
    Cc: Matthew Wilcox
    Cc: Mel Gorman
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Qian Cai
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Vlastimil Babka
     

28 Sep, 2019

1 commit

  • Pull kernel lockdown mode from James Morris:
    "This is the latest iteration of the kernel lockdown patchset, from
    Matthew Garrett, David Howells and others.

    From the original description:

    This patchset introduces an optional kernel lockdown feature,
    intended to strengthen the boundary between UID 0 and the kernel.
    When enabled, various pieces of kernel functionality are restricted.
    Applications that rely on low-level access to either hardware or the
    kernel may cease working as a result - therefore this should not be
    enabled without appropriate evaluation beforehand.

    The majority of mainstream distributions have been carrying variants
    of this patchset for many years now, so there's value in providing a
    doesn't meet every distribution requirement, but gets us much closer
    to not requiring external patches.

    There are two major changes since this was last proposed for mainline:

    - Separating lockdown from EFI secure boot. Background discussion is
    covered here: https://lwn.net/Articles/751061/

    - Implementation as an LSM, with a default stackable lockdown LSM
    module. This allows the lockdown feature to be policy-driven,
    rather than encoding an implicit policy within the mechanism.

    The new locked_down LSM hook is provided to allow LSMs to make a
    policy decision around whether kernel functionality that would allow
    tampering with or examining the runtime state of the kernel should be
    permitted.

    The included lockdown LSM provides an implementation with a simple
    policy intended for general purpose use. This policy provides a coarse
    level of granularity, controllable via the kernel command line:

    lockdown={integrity|confidentiality}

    Enable the kernel lockdown feature. If set to integrity, kernel features
    that allow userland to modify the running kernel are disabled. If set to
    confidentiality, kernel features that allow userland to extract
    confidential information from the kernel are also disabled.

    This may also be controlled via /sys/kernel/security/lockdown and
    overriden by kernel configuration.

    New or existing LSMs may implement finer-grained controls of the
    lockdown features. Refer to the lockdown_reason documentation in
    include/linux/security.h for details.

    The lockdown feature has had signficant design feedback and review
    across many subsystems. This code has been in linux-next for some
    weeks, with a few fixes applied along the way.

    Stephen Rothwell noted that commit 9d1f8be5cf42 ("bpf: Restrict bpf
    when kernel lockdown is in confidentiality mode") is missing a
    Signed-off-by from its author. Matthew responded that he is providing
    this under category (c) of the DCO"

    * 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
    kexec: Fix file verification on S390
    security: constify some arrays in lockdown LSM
    lockdown: Print current->comm in restriction messages
    efi: Restrict efivar_ssdt_load when the kernel is locked down
    tracefs: Restrict tracefs when the kernel is locked down
    debugfs: Restrict debugfs when the kernel is locked down
    kexec: Allow kexec_file() with appropriate IMA policy when locked down
    lockdown: Lock down perf when in confidentiality mode
    bpf: Restrict bpf when kernel lockdown is in confidentiality mode
    lockdown: Lock down tracing and perf kprobes when in confidentiality mode
    lockdown: Lock down /proc/kcore
    x86/mmiotrace: Lock down the testmmiotrace module
    lockdown: Lock down module params that specify hardware parameters (eg. ioport)
    lockdown: Lock down TIOCSSERIAL
    lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
    acpi: Disable ACPI table override if the kernel is locked down
    acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
    ACPI: Limit access to custom_method when the kernel is locked down
    x86/msr: Restrict MSR access when the kernel is locked down
    x86: Lock down IO port access when the kernel is locked down
    ...

    Linus Torvalds
     

25 Sep, 2019

3 commits

  • Replace open-coded bitmap array initialization of init_mm.cpu_bitmask with
    neat CPU_BITS_NONE macro.

    And, since init_mm.cpu_bitmask is statically set to zero, there is no way
    to clear it again in start_kernel().

    Link: http://lkml.kernel.org/r/1565703815-8584-1-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Reviewed-by: Andrew Morton
    Reviewed-by: David Hildenbrand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Both pgtable_cache_init() and pgd_cache_init() are used to initialize kmem
    cache for page table allocations on several architectures that do not use
    PAGE_SIZE tables for one or more levels of the page table hierarchy.

    Most architectures do not implement these functions and use __weak default
    NOP implementation of pgd_cache_init(). Since there is no such default
    for pgtable_cache_init(), its empty stub is duplicated among most
    architectures.

    Rename the definitions of pgd_cache_init() to pgtable_cache_init() and
    drop empty stubs of pgtable_cache_init().

    Link: http://lkml.kernel.org/r/1566457046-22637-1-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Acked-by: Will Deacon [arm64]
    Acked-by: Thomas Gleixner [x86]
    Cc: Catalin Marinas
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Currently kmemleak uses a static early_log buffer to trace all memory
    allocation/freeing before the slab allocator is initialised. Such early
    log is replayed during kmemleak_init() to properly initialise the kmemleak
    metadata for objects allocated up that point. With a memory pool that
    does not rely on the slab allocator, it is possible to skip this early log
    entirely.

    In order to remove the early logging, consider kmemleak_enabled == 1 by
    default while the kmem_cache availability is checked directly on the
    object_cache and scan_area_cache variables. The RCU callback is only
    invoked after object_cache has been initialised as we wouldn't have any
    concurrent list traversal before this.

    In order to reduce the number of callbacks before kmemleak is fully
    initialised, move the kmemleak_init() call to mm_init().

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: remove WARN_ON(), per Catalin]
    Link: http://lkml.kernel.org/r/20190812160642.52134-4-catalin.marinas@arm.com
    Signed-off-by: Catalin Marinas
    Cc: Matthew Wilcox
    Cc: Michal Hocko
    Cc: Qian Cai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Catalin Marinas
     

20 Aug, 2019

1 commit

  • The lockdown module is intended to allow for kernels to be locked down
    early in boot - sufficiently early that we don't have the ability to
    kmalloc() yet. Add support for early initialisation of some LSMs, and
    then add them to the list of names when we do full initialisation later.
    Early LSMs are initialised in link order and cannot be overridden via
    boot parameters, and cannot make use of kmalloc() (since the allocator
    isn't initialised yet).

    (Fixed by Stephen Rothwell to include a stub to fix builds when
    !CONFIG_SECURITY)

    Signed-off-by: Matthew Garrett
    Acked-by: Kees Cook
    Acked-by: Casey Schaufler
    Cc: Stephen Rothwell
    Signed-off-by: James Morris

    Matthew Garrett
     

01 Aug, 2019

1 commit

  • CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by
    CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same
    functionality which today depends on CONFIG_PREEMPT.

    Switch the preemption code, scheduler and init task over to use
    CONFIG_PREEMPTION.

    That's the first step towards RT in that area. The more complex changes are
    coming separately.

    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Cc: Linus Torvalds
    Cc: Masami Hiramatsu
    Cc: Paolo Bonzini
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/20190726212124.117528401@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

20 Jul, 2019

1 commit

  • Pull vfs mount updates from Al Viro:
    "The first part of mount updates.

    Convert filesystems to use the new mount API"

    * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    mnt_init(): call shmem_init() unconditionally
    constify ksys_mount() string arguments
    don't bother with registering rootfs
    init_rootfs(): don't bother with init_ramfs_fs()
    vfs: Convert smackfs to use the new mount API
    vfs: Convert selinuxfs to use the new mount API
    vfs: Convert securityfs to use the new mount API
    vfs: Convert apparmorfs to use the new mount API
    vfs: Convert openpromfs to use the new mount API
    vfs: Convert xenfs to use the new mount API
    vfs: Convert gadgetfs to use the new mount API
    vfs: Convert oprofilefs to use the new mount API
    vfs: Convert ibmasmfs to use the new mount API
    vfs: Convert qib_fs/ipathfs to use the new mount API
    vfs: Convert efivarfs to use the new mount API
    vfs: Convert configfs to use the new mount API
    vfs: Convert binfmt_misc to use the new mount API
    convenience helper: get_tree_single()
    convenience helper get_tree_nodev()
    vfs: Kill sget_userns()
    ...

    Linus Torvalds
     

13 Jul, 2019

1 commit

  • Print the currently enabled stack and heap initialization modes.

    Stack initialization is enabled by a config flag, while heap
    initialization is configured at boot time with defaults being set in the
    config. It's more convenient for the user to have all information about
    these hardening measures in one place at boot, so the user can reason
    about the expected behavior of the running system.

    The possible options for stack are:
    - "all" for CONFIG_INIT_STACK_ALL;
    - "byref_all" for CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL;
    - "byref" for CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF;
    - "__user" for CONFIG_GCC_PLUGIN_STRUCTLEAK_USER;
    - "off" otherwise.

    Depending on the values of init_on_alloc and init_on_free boottime options
    we also report "heap alloc" and "heap free" as "on"/"off".

    In the init_on_free mode initializing pages at boot time may take a while,
    so print a notice about that as well. This depends on how much memory is
    installed, the memory bandwidth, etc. On a relatively modern x86 system,
    it takes about 0.75s/GB to wipe all memory:

    [ 0.418722] mem auto-init: stack:byref_all, heap alloc:off, heap free:on
    [ 0.419765] mem auto-init: clearing system memory may take some time...
    [ 12.376605] Memory: 16408564K/16776672K available (14339K kernel code, 1397K rwdata, 3756K rodata, 1636K init, 11460K bss, 368108K reserved, 0K cma-reserved)

    Link: http://lkml.kernel.org/r/20190617151050.92663-3-glider@google.com
    Signed-off-by: Alexander Potapenko
    Suggested-by: Kees Cook
    Acked-by: Kees Cook
    Cc: Christoph Lameter
    Cc: Dmitry Vyukov
    Cc: James Morris
    Cc: Jann Horn
    Cc: Kostya Serebryany
    Cc: Laura Abbott
    Cc: Mark Rutland
    Cc: Masahiro Yamada
    Cc: Matthew Wilcox
    Cc: Nick Desaulniers
    Cc: Randy Dunlap
    Cc: Sandeep Patil
    Cc: "Serge E. Hallyn"
    Cc: Souptick Joarder
    Cc: Marco Elver
    Cc: Kaiwan N Billimoria
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     

05 Jul, 2019

1 commit

  • No point having two call sites (earlier in init_rootfs() from
    mnt_init() in case we are going to use shmem-style rootfs,
    later from do_basic_setup() unconditionally), along with the
    logics in shmem_init() itself to make the second call a no-op...

    Signed-off-by: Al Viro

    Al Viro
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

15 May, 2019

2 commits

  • Various architectures including x86 poison the freed init memory. Do the
    same in the generic free_initmem implementation and switch sparc32
    architecture that is identical to the generic code over to it now.

    Link: http://lkml.kernel.org/r/1550515285-17446-4-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Reviewed-by: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Palmer Dabbelt
    Cc: Richard Kuo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Patch series "provide a generic free_initmem implementation", v2.

    Many architectures implement free_initmem() in exactly the same or very
    similar way: they wrap the call to free_initmem_default() with sometimes
    different 'poison' parameter.

    These patches switch those architectures to use a generic implementation
    that does free_initmem_default(POISON_FREE_INITMEM).

    This was inspired by Christoph's patches for free_initrd_mem [1] and I
    shamelessly copied changelog entries from his patches :)

    [1] https://lore.kernel.org/lkml/20190213174621.29297-1-hch@lst.de/

    This patch (of 2):

    For most architectures free_initmem just a wrapper for the same
    free_initmem_default(-1) call. Provide that as a generic implementation
    marked __weak.

    Link: http://lkml.kernel.org/r/1550515285-17446-2-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Reviewed-by: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Palmer Dabbelt
    Cc: Richard Kuo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

08 May, 2019

2 commits

  • Pull randomness updates from Ted Ts'o:

    - initialize the random driver earler

    - fix CRNG initialization when we trust the CPU's RNG on NUMA systems

    - other miscellaneous cleanups and fixes.

    * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
    random: add a spinlock_t to struct batched_entropy
    random: document get_random_int() family
    random: fix CRNG initialization when random.trust_cpu=1
    random: move rand_initialize() earlier
    random: only read from /dev/random after its pool has received 128 bits
    drivers/char/random.c: make primary_crng static
    drivers/char/random.c: remove unused stuct poolinfo::poolbits
    drivers/char/random.c: constify poolinfo_table

    Linus Torvalds
     
  • Pull printk updates from Petr Mladek:

    - Allow state reset of printk_once() calls.

    - Prevent crashes when dereferencing invalid pointers in vsprintf().
    Only the first byte is checked for simplicity.

    - Make vsprintf warnings consistent and inlined.

    - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf
    modifiers.

    - Some clean up of vsprintf and test_printf code.

    * tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
    lib/vsprintf: Make function pointer_string static
    vsprintf: Limit the length of inlined error messages
    vsprintf: Avoid confusion between invalid address and value
    vsprintf: Prevent crash when dereferencing invalid pointers
    vsprintf: Consolidate handling of unknown pointer specifiers
    vsprintf: Factor out %pO handler as kobject_string()
    vsprintf: Factor out %pV handler as va_format()
    vsprintf: Factor out %p[iI] handler as ip_addr_string()
    vsprintf: Do not check address of well-known strings
    vsprintf: Consistent %pK handling for kptr_restrict == 0
    vsprintf: Shuffle restricted_pointer()
    printk: Tie printk_once / printk_deferred_once into .data.once for reset
    treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
    lib/test_printf: Switch to bitmap_zalloc()

    Linus Torvalds
     

06 May, 2019

1 commit

  • Poking-mm initialization might require to duplicate the PGD in early
    stage. Initialize the PGD cache earlier to prevent boot failures.

    Reported-by: kernel test robot
    Signed-off-by: Nadav Amit
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rick Edgecombe
    Cc: Rik van Riel
    Cc: Stephen Rothwell
    Cc: Thomas Gleixner
    Fixes: 4fc19708b165 ("x86/alternatives: Initialize temporary mm for patching")
    Link: http://lkml.kernel.org/r/20190505011124.39692-1-namit@vmware.com
    Signed-off-by: Ingo Molnar

    Nadav Amit
     

30 Apr, 2019

1 commit

  • To prevent improper use of the PTEs that are used for text patching, the
    next patches will use a temporary mm struct. Initailize it by copying
    the init mm.

    The address that will be used for patching is taken from the lower area
    that is usually used for the task memory. Doing so prevents the need to
    frequently synchronize the temporary-mm (e.g., when BPF programs are
    installed), since different PGDs are used for the task memory.

    Finally, randomize the address of the PTEs to harden against exploits
    that use these PTEs.

    Suggested-by: Andy Lutomirski
    Tested-by: Masami Hiramatsu
    Signed-off-by: Nadav Amit
    Signed-off-by: Rick Edgecombe
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Masami Hiramatsu
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: ard.biesheuvel@linaro.org
    Cc: deneen.t.dock@intel.com
    Cc: kernel-hardening@lists.openwall.com
    Cc: kristen@linux.intel.com
    Cc: linux_dti@icloud.com
    Cc: will.deacon@arm.com
    Link: https://lkml.kernel.org/r/20190426232303.28381-8-nadav.amit@gmail.com
    Signed-off-by: Ingo Molnar

    Nadav Amit
     

20 Apr, 2019

2 commits

  • Right now rand_initialize() is run as an early_initcall(), but it only
    depends on timekeeping_init() (for mixing ktime_get_real() into the
    pools). However, the call to boot_init_stack_canary() for stack canary
    initialization runs earlier, which triggers a warning at boot:

    random: get_random_bytes called from start_kernel+0x357/0x548 with crng_init=0

    Instead, this moves rand_initialize() to after timekeeping_init(), and moves
    canary initialization here as well.

    Note that this warning may still remain for machines that do not have
    UEFI RNG support (which initializes the RNG pools during setup_arch()),
    or for x86 machines without RDRAND (or booting without "random.trust=on"
    or CONFIG_RANDOM_TRUST_CPU=y).

    Signed-off-by: Kees Cook
    Signed-off-by: Theodore Ts'o

    Kees Cook
     
  • When a module option, or core kernel argument, toggles a static-key it
    requires jump labels to be initialized early. While x86, PowerPC, and
    ARM64 arrange for jump_label_init() to be called before parse_args(),
    ARM does not.

    Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1 console=ttyAMA0,115200 page_alloc.shuffle=1
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
    page_alloc_shuffle+0x12c/0x1ac
    static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
    before call to jump_label_init()
    Modules linked in:
    CPU: 0 PID: 0 Comm: swapper Not tainted
    5.1.0-rc4-next-20190410-00003-g3367c36ce744 #1
    Hardware name: ARM Integrator/CP (Device Tree)
    [] (unwind_backtrace) from [] (show_stack+0x10/0x18)
    [] (show_stack) from [] (dump_stack+0x18/0x24)
    [] (dump_stack) from [] (__warn+0xe0/0x108)
    [] (__warn) from [] (warn_slowpath_fmt+0x44/0x6c)
    [] (warn_slowpath_fmt) from []
    (page_alloc_shuffle+0x12c/0x1ac)
    [] (page_alloc_shuffle) from [] (shuffle_store+0x28/0x48)
    [] (shuffle_store) from [] (parse_args+0x1f4/0x350)
    [] (parse_args) from [] (start_kernel+0x1c0/0x488)

    Move the fallback call to jump_label_init() to occur before
    parse_args().

    The redundant calls to jump_label_init() in other archs are left intact
    in case they have static key toggling use cases that are even earlier
    than option parsing.

    Link: http://lkml.kernel.org/r/155544804466.1032396.13418949511615676665.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams
    Reported-by: Guenter Roeck
    Reviewed-by: Kees Cook
    Cc: Mathieu Desnoyers
    Cc: Thomas Gleixner
    Cc: Mike Rapoport
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

09 Apr, 2019

1 commit

  • %pF and %pf are functionally equivalent to %pS and %ps conversion
    specifiers. The former are deprecated, therefore switch the current users
    to use the preferred variant.

    The changes have been produced by the following command:

    git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
    while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done

    And verifying the result.

    Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
    Cc: Andy Shevchenko
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: sparclinux@vger.kernel.org
    Cc: linux-um@lists.infradead.org
    Cc: xen-devel@lists.xenproject.org
    Cc: linux-acpi@vger.kernel.org
    Cc: linux-pm@vger.kernel.org
    Cc: drbd-dev@lists.linbit.com
    Cc: linux-block@vger.kernel.org
    Cc: linux-mmc@vger.kernel.org
    Cc: linux-nvdimm@lists.01.org
    Cc: linux-pci@vger.kernel.org
    Cc: linux-scsi@vger.kernel.org
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: linux-mm@kvack.org
    Cc: ceph-devel@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Signed-off-by: Sakari Ailus
    Acked-by: David Sterba (for btrfs)
    Acked-by: Mike Rapoport (for mm/memblock.c)
    Acked-by: Bjorn Helgaas (for drivers/pci)
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Petr Mladek

    Sakari Ailus
     

13 Mar, 2019

1 commit

  • Add panic() calls if memblock_alloc() returns NULL.

    The panic() format duplicates the one used by memblock itself and in
    order to avoid explosion with long parameters list replace open coded
    allocation size calculations with a local variable.

    Link: http://lkml.kernel.org/r/1548057848-15136-18-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Catalin Marinas
    Cc: Christophe Leroy
    Cc: Christoph Hellwig
    Cc: "David S. Miller"
    Cc: Dennis Zhou
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Guo Ren [c-sky]
    Cc: Heiko Carstens
    Cc: Juergen Gross [Xen]
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Paul Burton
    Cc: Petr Mladek
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Rob Herring
    Cc: Rob Herring
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

13 Feb, 2019

1 commit

  • This reverts commit fe53ca54270a ("mm: use early_pfn_to_nid in
    page_ext_init").

    When booting a system with "page_owner=on",

    start_kernel
    page_ext_init
    invoke_init_callbacks
    init_section_page_ext
    init_page_owner
    init_early_allocated_pages
    init_zones_in_node
    init_pages_in_zone
    lookup_page_ext
    page_to_nid

    The issue here is that page_to_nid() will not work since some page flags
    have no node information until later in page_alloc_init_late() due to
    DEFERRED_STRUCT_PAGE_INIT. Hence, it could trigger an out-of-bounds
    access with an invalid nid.

    UBSAN: Undefined behaviour in ./include/linux/mm.h:1104:50
    index 7 is out of range for type 'zone [5]'

    Also, kernel will panic since flags were poisoned earlier with,

    CONFIG_DEBUG_VM_PGFLAGS=y
    CONFIG_NODE_NOT_IN_PAGE_FLAGS=n

    start_kernel
    setup_arch
    pagetable_init
    paging_init
    sparse_init
    sparse_init_nid
    memblock_alloc_try_nid_raw

    It did not handle it well in init_pages_in_zone() which ends up calling
    page_to_nid().

    page:ffffea0004200000 is uninitialized and poisoned
    raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
    raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
    page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
    page_owner info is not active (free page?)
    kernel BUG at include/linux/mm.h:990!
    RIP: 0010:init_page_owner+0x486/0x520

    This means that assumptions behind commit fe53ca54270a ("mm: use
    early_pfn_to_nid in page_ext_init") are incomplete. Therefore, revert
    the commit for now. A proper way to move the page_owner initialization
    to sooner is to hook into memmap initialization.

    Link: http://lkml.kernel.org/r/20190115202812.75820-1-cai@lca.pw
    Signed-off-by: Qian Cai
    Acked-by: Michal Hocko
    Cc: Pasha Tatashin
    Cc: Mel Gorman
    Cc: Yang Shi
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qian Cai
     

05 Jan, 2019

2 commits

  • We get a warning when building kernel with W=1:

    kernel/fork.c:167:13: warning: no previous prototype for `arch_release_thread_stack' [-Wmissing-prototypes]
    kernel/fork.c:779:13: warning: no previous prototype for `fork_init' [-Wmissing-prototypes]

    Add the missing declaration in head file to fix this.

    Also, remove arch_release_thread_stack() completely because no arch
    seems to implement it since bb9d81264 (arch: remove tile port).

    Link: http://lkml.kernel.org/r/1542170087-23645-1-git-send-email-wang.yi59@zte.com.cn
    Signed-off-by: Yi Wang
    Acked-by: Michal Hocko
    Acked-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yi Wang
     
  • Initcall names should not be changed.

    Link: http://lkml.kernel.org/r/20181124091829.GD10969@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

29 Dec, 2018

3 commits

  • Merge misc updates from Andrew Morton:

    - large KASAN update to use arm's "software tag-based mode"

    - a few misc things

    - sh updates

    - ocfs2 updates

    - just about all of MM

    * emailed patches from Andrew Morton : (167 commits)
    kernel/fork.c: mark 'stack_vm_area' with __maybe_unused
    memcg, oom: notify on oom killer invocation from the charge path
    mm, swap: fix swapoff with KSM pages
    include/linux/gfp.h: fix typo
    mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm
    hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
    hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
    memory_hotplug: add missing newlines to debugging output
    mm: remove __hugepage_set_anon_rmap()
    include/linux/vmstat.h: remove unused page state adjustment macro
    mm/page_alloc.c: allow error injection
    mm: migrate: drop unused argument of migrate_page_move_mapping()
    blkdev: avoid migration stalls for blkdev pages
    mm: migrate: provide buffer_migrate_page_norefs()
    mm: migrate: move migrate_page_lock_buffers()
    mm: migrate: lock buffers before migrate_page_move_mapping()
    mm: migration: factor out code to compute expected number of page references
    mm, page_alloc: enable pcpu_drain with zone capability
    kmemleak: add config to select auto scan
    mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init
    ...

    Linus Torvalds
     
  • Pull block updates from Jens Axboe:
    "This is the main pull request for block/storage for 4.21.

    Larger than usual, it was a busy round with lots of goodies queued up.
    Most notable is the removal of the old IO stack, which has been a long
    time coming. No new features for a while, everything coming in this
    week has all been fixes for things that were previously merged.

    This contains:

    - Use atomic counters instead of semaphores for mtip32xx (Arnd)

    - Cleanup of the mtip32xx request setup (Christoph)

    - Fix for circular locking dependency in loop (Jan, Tetsuo)

    - bcache (Coly, Guoju, Shenghui)
    * Optimizations for writeback caching
    * Various fixes and improvements

    - nvme (Chaitanya, Christoph, Sagi, Jay, me, Keith)
    * host and target support for NVMe over TCP
    * Error log page support
    * Support for separate read/write/poll queues
    * Much improved polling
    * discard OOM fallback
    * Tracepoint improvements

    - lightnvm (Hans, Hua, Igor, Matias, Javier)
    * Igor added packed metadata to pblk. Now drives without metadata
    per LBA can be used as well.
    * Fix from Geert on uninitialized value on chunk metadata reads.
    * Fixes from Hans and Javier to pblk recovery and write path.
    * Fix from Hua Su to fix a race condition in the pblk recovery
    code.
    * Scan optimization added to pblk recovery from Zhoujie.
    * Small geometry cleanup from me.

    - Conversion of the last few drivers that used the legacy path to
    blk-mq (me)

    - Removal of legacy IO path in SCSI (me, Christoph)

    - Removal of legacy IO stack and schedulers (me)

    - Support for much better polling, now without interrupts at all.
    blk-mq adds support for multiple queue maps, which enables us to
    have a map per type. This in turn enables nvme to have separate
    completion queues for polling, which can then be interrupt-less.
    Also means we're ready for async polled IO, which is hopefully
    coming in the next release.

    - Killing of (now) unused block exports (Christoph)

    - Unification of the blk-rq-qos and blk-wbt wait handling (Josef)

    - Support for zoned testing with null_blk (Masato)

    - sx8 conversion to per-host tag sets (Christoph)

    - IO priority improvements (Damien)

    - mq-deadline zoned fix (Damien)

    - Ref count blkcg series (Dennis)

    - Lots of blk-mq improvements and speedups (me)

    - sbitmap scalability improvements (me)

    - Make core inflight IO accounting per-cpu (Mikulas)

    - Export timeout setting in sysfs (Weiping)

    - Cleanup the direct issue path (Jianchao)

    - Export blk-wbt internals in block debugfs for easier debugging
    (Ming)

    - Lots of other fixes and improvements"

    * tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block: (364 commits)
    kyber: use sbitmap add_wait_queue/list_del wait helpers
    sbitmap: add helpers for add/del wait queue handling
    block: save irq state in blkg_lookup_create()
    dm: don't reuse bio for flushes
    nvme-pci: trace SQ status on completions
    nvme-rdma: implement polling queue map
    nvme-fabrics: allow user to pass in nr_poll_queues
    nvme-fabrics: allow nvmf_connect_io_queue to poll
    nvme-core: optionally poll sync commands
    block: make request_to_qc_t public
    nvme-tcp: fix spelling mistake "attepmpt" -> "attempt"
    nvme-tcp: fix endianess annotations
    nvmet-tcp: fix endianess annotations
    nvme-pci: refactor nvme_poll_irqdisable to make sparse happy
    nvme-pci: only set nr_maps to 2 if poll queues are supported
    nvmet: use a macro for default error location
    nvmet: fix comparison of a u16 with -1
    blk-mq: enable IO poll if .nr_queues of type poll > 0
    blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()
    blk-mq: skip zero-queue maps in blk_mq_map_swqueue
    ...

    Linus Torvalds
     
  • The current value of the early boot static pool size, 1024 is not big
    enough for systems with large number of CPUs with timer or/and workqueue
    objects selected. As the results, systems have 60+ CPUs with both timer
    and workqueue objects enabled could trigger "ODEBUG: Out of memory.
    ODEBUG disabled".

    Some debug objects are allocated during the early boot. Enabling some
    options like timers or workqueue objects may increase the size required
    significantly with large number of CPUs. For example,

    CONFIG_DEBUG_OBJECTS_TIMERS:
    No. CPUs x 2 (worker pool) objects:
    start_kernel
    workqueue_init_early
    init_worker_pool
    init_timer_key
    debug_object_init

    plus No. CPUs objects (CONFIG_HIGH_RES_TIMERS):
    sched_init
    hrtick_rq_init
    hrtimer_init

    CONFIG_DEBUG_OBJECTS_WORK:
    No. CPUs objects:
    vmalloc_init
    __init_work

    plus No. CPUs x 6 (workqueue) objects:
    workqueue_init_early
    alloc_workqueue
    __alloc_workqueue_key
    alloc_and_link_pwqs
    init_pwq

    Also, plus No. CPUs objects:
    perf_event_init
    __init_srcu_struct
    init_srcu_struct_fields
    init_srcu_struct_nodes
    __init_work

    However, none of the things are actually used or required before
    debug_objects_mem_init() is invoked, so just move the call right before
    vmalloc_init().

    According to tglx, "the reason why the call is at this place in
    start_kernel() is historical. It's because back in the days when
    debugobjects were added the memory allocator was enabled way later than
    today."

    Link: http://lkml.kernel.org/r/20181126102407.1836-1-cai@gmx.us
    Signed-off-by: Qian Cai
    Suggested-by: Thomas Gleixner
    Cc: Waiman Long
    Cc: Yang Shi
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qian Cai
     

27 Dec, 2018

1 commit

  • Pull EFI updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Allocate the E820 buffer before doing the
    GetMemoryMap/ExitBootServices dance so we don't run out of space

    - Clear EFI boot services mappings when freeing the memory

    - Harden efivars against callers that invoke it on non-EFI boots

    - Reduce the number of memblock reservations resulting from extensive
    use of the new efi_mem_reserve_persistent() API

    - Other assorted fixes and cleanups"

    * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/efi: Don't unmap EFI boot services code/data regions for EFI_OLD_MEMMAP and EFI_MIXED_MODE
    efi: Reduce the amount of memblock reservations for persistent allocations
    efi: Permit multiple entries in persistent memreserve data structure
    efi/libstub: Disable some warnings for x86{,_64}
    x86/efi: Move efi__boot_services() to arch/x86
    x86/efi: Unmap EFI boot services code/data regions from efi_pgd
    x86/mm/pageattr: Introduce helper function to unmap EFI boot services
    efi/fdt: Simplify the get_fdt() flow
    efi/fdt: Indentation fix
    firmware/efi: Add NULL pointer checks in efivars API functions

    Linus Torvalds
     

30 Nov, 2018

1 commit

  • efi__boot_services() are x86 specific quirks and as such
    should be in asm/efi.h, so move them from linux/efi.h. Also, call
    efi_free_boot_services() from __efi_enter_virtual_mode() as it is x86
    specific call and ideally shouldn't be part of init/main.c

    Signed-off-by: Sai Praneeth Prakhya
    Signed-off-by: Ard Biesheuvel
    Acked-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Arend van Spriel
    Cc: Bhupesh Sharma
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: Eric Snowberg
    Cc: Hans de Goede
    Cc: Joe Perches
    Cc: Jon Hunter
    Cc: Julien Thierry
    Cc: Linus Torvalds
    Cc: Marc Zyngier
    Cc: Matt Fleming
    Cc: Nathan Chancellor
    Cc: Peter Zijlstra
    Cc: Sedat Dilek
    Cc: YiFei Zhu
    Cc: linux-efi@vger.kernel.org
    Link: http://lkml.kernel.org/r/20181129171230.18699-7-ard.biesheuvel@linaro.org
    Signed-off-by: Ingo Molnar

    Sai Praneeth Prakhya
     

28 Nov, 2018

1 commit


08 Nov, 2018

1 commit

  • This removes a bunch of core and elevator related code. On the core
    front, we remove anything related to queue running, draining,
    initialization, plugging, and congestions. We also kill anything
    related to request allocation, merging, retrieval, and completion.

    Remove any checking for single queue IO schedulers, as they no
    longer exist. This means we can also delete a bunch of code related
    to request issue, adding, completion, etc - and all the SQ related
    ops and helpers.

    Also kill the load_default_modules(), as all that did was provide
    for a way to load the default single queue elevator.

    Tested-by: Ming Lei
    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jens Axboe
     

31 Oct, 2018

4 commits

  • When a memblock allocation APIs are called with align = 0, the alignment
    is implicitly set to SMP_CACHE_BYTES.

    Implicit alignment is done deep in the memblock allocator and it can
    come as a surprise. Not that such an alignment would be wrong even
    when used incorrectly but it is better to be explicit for the sake of
    clarity and the prinicple of the least surprise.

    Replace all such uses of memblock APIs with the 'align' parameter
    explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment
    in the memblock internal allocation functions.

    For the case when memblock APIs are used via helper functions, e.g. like
    iommu_arena_new_node() in Alpha, the helper functions were detected with
    Coccinelle's help and then manually examined and updated where
    appropriate.

    The direct memblock APIs users were updated using the semantic patch below:

    @@
    expression size, min_addr, max_addr, nid;
    @@
    (
    |
    - memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid)
    + memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr,
    nid)
    |
    - memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid)
    + memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr,
    nid)
    |
    - memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid)
    + memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid)
    |
    - memblock_alloc(size, 0)
    + memblock_alloc(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_raw(size, 0)
    + memblock_alloc_raw(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_from(size, 0, min_addr)
    + memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr)
    |
    - memblock_alloc_nopanic(size, 0)
    + memblock_alloc_nopanic(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_low(size, 0)
    + memblock_alloc_low(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_low_nopanic(size, 0)
    + memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_from_nopanic(size, 0, min_addr)
    + memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr)
    |
    - memblock_alloc_node(size, 0, nid)
    + memblock_alloc_node(size, SMP_CACHE_BYTES, nid)
    )

    [mhocko@suse.com: changelog update]
    [akpm@linux-foundation.org: coding-style fixes]
    [rppt@linux.ibm.com: fix missed uses of implicit alignment]
    Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx
    Link: http://lkml.kernel.org/r/1538687224-17535-1-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Suggested-by: Michal Hocko
    Acked-by: Paul Burton [MIPS]
    Acked-by: Michael Ellerman [powerpc]
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: Geert Uytterhoeven
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: Matt Turner
    Cc: Michal Simek
    Cc: Richard Weinberger
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Move remaining definitions and declarations from include/linux/bootmem.h
    into include/linux/memblock.h and remove the redundant header.

    The includes were replaced with the semantic patch below and then
    semi-automated removal of duplicated '#include

    @@
    @@
    - #include
    + #include

    [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
    [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
    [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
    Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
    Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Stephen Rothwell
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • The alloc_bootmem(size) is a shortcut for allocation of SMP_CACHE_BYTES
    aligned memory. When the align parameter of memblock_alloc() is 0, the
    alignment is implicitly set to SMP_CACHE_BYTES and thus alloc_bootmem(size)
    and memblock_alloc(size, 0) are equivalent.

    The conversion is done using the following semantic patch:

    @@
    expression size;
    @@
    - alloc_bootmem(size)
    + memblock_alloc(size, 0)

    Link: http://lkml.kernel.org/r/1536927045-23536-22-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • The conversion is done using

    sed -i 's@memblock_virt_alloc@memblock_alloc@g' \
    $(git grep -l memblock_virt_alloc)

    Link: http://lkml.kernel.org/r/1536927045-23536-8-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

23 Oct, 2018

1 commit

  • Pull locking and misc x86 updates from Ingo Molnar:
    "Lots of changes in this cycle - in part because locking/core attracted
    a number of related x86 low level work which was easier to handle in a
    single tree:

    - Linux Kernel Memory Consistency Model updates (Alan Stern, Paul E.
    McKenney, Andrea Parri)

    - lockdep scalability improvements and micro-optimizations (Waiman
    Long)

    - rwsem improvements (Waiman Long)

    - spinlock micro-optimization (Matthew Wilcox)

    - qspinlocks: Provide a liveness guarantee (more fairness) on x86.
    (Peter Zijlstra)

    - Add support for relative references in jump tables on arm64, x86
    and s390 to optimize jump labels (Ard Biesheuvel, Heiko Carstens)

    - Be a lot less permissive on weird (kernel address) uaccess faults
    on x86: BUG() when uaccess helpers fault on kernel addresses (Jann
    Horn)

    - macrofy x86 asm statements to un-confuse the GCC inliner. (Nadav
    Amit)

    - ... and a handful of other smaller changes as well"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
    locking/lockdep: Make global debug_locks* variables read-mostly
    locking/lockdep: Fix debug_locks off performance problem
    locking/pvqspinlock: Extend node size when pvqspinlock is configured
    locking/qspinlock_stat: Count instances of nested lock slowpaths
    locking/qspinlock, x86: Provide liveness guarantee
    x86/asm: 'Simplify' GEN_*_RMWcc() macros
    locking/qspinlock: Rework some comments
    locking/qspinlock: Re-order code
    locking/lockdep: Remove duplicated 'lock_class_ops' percpu array
    x86/defconfig: Enable CONFIG_USB_XHCI_HCD=y
    futex: Replace spin_is_locked() with lockdep
    locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y
    x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs
    x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs
    x86/extable: Macrofy inline assembly code to work around GCC inlining bugs
    x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops
    x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs
    x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs
    x86/refcount: Work around GCC inlining bug
    x86/objtool: Use asm macros to work around GCC inlining bugs
    ...

    Linus Torvalds
     

09 Oct, 2018

1 commit

  • With CONFIG_VMAP_STACK=y the kernel stack of all tasks should be
    allocated in the vmalloc space. The initial stack used for all
    the early init code is in the init_thread_union. To be able to
    switch from this early stack to a properly allocated stack
    from vmalloc the architecture needs a switch-over point.

    Introduce the arch_call_rest_init() function with a weak definition
    in init/main.c with the only purpose to call rest_init() from the
    end of start_kernel(). The architecture override can then do the
    necessary magic to switch to the new vmalloc'ed stack.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

27 Sep, 2018

1 commit

  • Jump table entries are mostly read-only, with the exception of the
    init and module loader code that defuses entries that point into init
    code when the code being referred to is freed.

    For robustness, it would be better to move these entries into the
    ro_after_init section, but clearing the 'code' member of each jump
    table entry referring to init code at module load time races with the
    module_enable_ro() call that remaps the ro_after_init section read
    only, so we'd like to do it earlier.

    So given that whether such an entry refers to init code can be decided
    much earlier, we can pull this check forward. Since we may still need
    the code entry at this point, let's switch to setting a low bit in the
    'key' member just like we do to annotate the default state of a jump
    table entry.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kees Cook
    Acked-by: Peter Zijlstra (Intel)
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-s390@vger.kernel.org
    Cc: Arnd Bergmann
    Cc: Heiko Carstens
    Cc: Will Deacon
    Cc: Catalin Marinas
    Cc: Steven Rostedt
    Cc: Martin Schwidefsky
    Cc: Jessica Yu
    Link: https://lkml.kernel.org/r/20180919065144.25010-8-ard.biesheuvel@linaro.org

    Ard Biesheuvel
     

23 Aug, 2018

2 commits

  • Add a log message to `run_init_process()`.

    This log message serves two purposes.

    1. If the init process is not specified on the Linux Kernel command
    line, the user sees, what file was chosen.

    2. The time stamps shows exactly, when the Linux kernel handed over
    control to the init process.

    Link: http://lkml.kernel.org/r/b1fc97fa-4aa9-1904-ddb5-859e78995c41@molgen.mpg.de
    Signed-off-by: Paul Menzel
    Reviewed-by: Andrew Morton
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menzel
     
  • Allow the initcall tables to be emitted using relative references that
    are only half the size on 64-bit architectures and don't require fixups
    at runtime on relocatable kernels.

    Link: http://lkml.kernel.org/r/20180704083651.24360-5-ard.biesheuvel@linaro.org
    Acked-by: James Morris
    Acked-by: Sergey Senozhatsky
    Acked-by: Petr Mladek
    Acked-by: Michael Ellerman
    Acked-by: Ingo Molnar
    Signed-off-by: Ard Biesheuvel
    Cc: Arnd Bergmann
    Cc: Benjamin Herrenschmidt
    Cc: Bjorn Helgaas
    Cc: Catalin Marinas
    Cc: James Morris
    Cc: Jessica Yu
    Cc: Josh Poimboeuf
    Cc: Kees Cook
    Cc: Nicolas Pitre
    Cc: Paul Mackerras
    Cc: Russell King
    Cc: "Serge E. Hallyn"
    Cc: Steven Rostedt
    Cc: Thomas Garnier
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ard Biesheuvel