17 May, 2019

1 commit


15 May, 2019

1 commit

  • No need to handle the freeing disable in arch code when we already have a
    core hook (and a different name for the option) for it.

    Link: http://lkml.kernel.org/r/20190213174621.29297-7-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Acked-by: Catalin Marinas [arm64]
    Acked-by: Mike Rapoport
    Cc: Geert Uytterhoeven [m68k]
    Cc: Steven Price
    Cc: Alexander Viro
    Cc: Guan Xuetao
    Cc: Russell King
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

07 May, 2019

2 commits

  • Pull x86 mm updates from Ingo Molnar:
    "The changes in here are:

    - text_poke() fixes and an extensive set of executability lockdowns,
    to (hopefully) eliminate the last residual circumstances under
    which we are using W|X mappings even temporarily on x86 kernels.
    This required a broad range of surgery in text patching facilities,
    module loading, trampoline handling and other bits.

    - tweak page fault messages to be more informative and more
    structured.

    - remove DISCONTIGMEM support on x86-32 and make SPARSEMEM the
    default.

    - reduce KASLR granularity on 5-level paging kernels from 512 GB to
    1 GB.

    - misc other changes and updates"

    * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
    x86/mm: Initialize PGD cache during mm initialization
    x86/alternatives: Add comment about module removal races
    x86/kprobes: Use vmalloc special flag
    x86/ftrace: Use vmalloc special flag
    bpf: Use vmalloc special flag
    modules: Use vmalloc special flag
    mm/vmalloc: Add flag for freeing of special permsissions
    mm/hibernation: Make hibernation handle unmapped pages
    x86/mm/cpa: Add set_direct_map_*() functions
    x86/alternatives: Remove the return value of text_poke_*()
    x86/jump-label: Remove support for custom text poker
    x86/modules: Avoid breaking W^X while loading modules
    x86/kprobes: Set instruction page as executable
    x86/ftrace: Set trampoline pages as executable
    x86/kgdb: Avoid redundant comparison of patched code
    x86/alternatives: Use temporary mm for text poking
    x86/alternatives: Initialize temporary mm for patching
    fork: Provide a function for copying init_mm
    uprobes: Initialize uprobes earlier
    x86/mm: Save debug registers when loading a temporary mm
    ...

    Linus Torvalds
     
  • Pull locking updates from Ingo Molnar:
    "Here are the locking changes in this cycle:

    - rwsem unification and simpler micro-optimizations to prepare for
    more intrusive (and more lucrative) scalability improvements in
    v5.3 (Waiman Long)

    - Lockdep irq state tracking flag usage cleanups (Frederic
    Weisbecker)

    - static key improvements (Jakub Kicinski, Peter Zijlstra)

    - misc updates, cleanups and smaller fixes"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
    locking/lockdep: Remove unnecessary unlikely()
    locking/static_key: Don't take sleeping locks in __static_key_slow_dec_deferred()
    locking/static_key: Factor out the fast path of static_key_slow_dec()
    locking/static_key: Add support for deferred static branches
    locking/lockdep: Test all incompatible scenarios at once in check_irq_usage()
    locking/lockdep: Avoid bogus Clang warning
    locking/lockdep: Generate LOCKF_ bit composites
    locking/lockdep: Use expanded masks on find_usage_*() functions
    locking/lockdep: Map remaining magic numbers to lock usage mask names
    locking/lockdep: Move valid_state() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING
    locking/rwsem: Prevent unneeded warning during locking selftest
    locking/rwsem: Optimize rwsem structure for uncontended lock acquisition
    locking/rwsem: Enable lock event counting
    locking/lock_events: Don't show pvqspinlock events on bare metal
    locking/lock_events: Make lock_events available for all archs & other locks
    locking/qspinlock_stat: Introduce generic lockevent_*() counting APIs
    locking/rwsem: Enhance DEBUG_RWSEMS_WARN_ON() macro
    locking/rwsem: Add debug check for __down_read*()
    locking/rwsem: Micro-optimize rwsem_try_read_lock_unqueued()
    locking/rwsem: Move rwsem internal function declarations to rwsem-xadd.h
    ...

    Linus Torvalds
     

30 Apr, 2019

2 commits

  • Add two new functions set_direct_map_default_noflush() and
    set_direct_map_invalid_noflush() for setting the direct map alias for the
    page to its default valid permissions and to an invalid state that cannot
    be cached in a TLB, respectively. These functions do not flush the TLB.

    Note, __kernel_map_pages() does something similar but flushes the TLB and
    doesn't reset the permission bits to default on all architectures.

    Also add an ARCH config ARCH_HAS_SET_DIRECT_MAP for specifying whether
    these have an actual implementation or a default empty one.

    Signed-off-by: Rick Edgecombe
    Signed-off-by: Peter Zijlstra (Intel)
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc:
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Nadav Amit
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Link: https://lkml.kernel.org/r/20190426001143.4983-15-namit@vmware.com
    Signed-off-by: Ingo Molnar

    Rick Edgecombe
     
  • As Stepan Golosunov points out, there is a small mistake in the
    get_timespec64() function in the kernel. It was originally added under the
    assumption that CONFIG_64BIT_TIME would get enabled on all 32-bit and
    64-bit architectures, but when the conversion was done, it was only turned
    on for 32-bit ones.

    The effect is that the get_timespec64() function never clears the upper
    half of the tv_nsec field for 32-bit tasks in compat mode. Clearing this is
    required for POSIX compliant behavior of functions that pass a 'timespec'
    structure with a 64-bit tv_sec and a 32-bit tv_nsec, plus uninitialized
    padding.

    The easiest fix for linux-5.1 is to just make the Kconfig symbol
    unconditional, as it was originally intended. As a follow-up, the #ifdef
    CONFIG_64BIT_TIME can be removed completely..

    Note: for native 32-bit mode, no change is needed, this works as
    designed and user space should never need to clear the upper 32
    bits of the tv_nsec field, in or out of the kernel.

    Fixes: 00bf25d693e7 ("y2038: use time32 syscall names on 32-bit")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Thomas Gleixner
    Cc: Joseph Myers
    Cc: libc-alpha@sourceware.org
    Cc: linux-api@vger.kernel.org
    Cc: Deepa Dinamani
    Cc: Lukasz Majewski
    Cc: Stepan Golosunov
    Link: https://lore.kernel.org/lkml/20190422090710.bmxdhhankurhafxq@sghpc.golosunov.pp.ru/
    Link: https://lkml.kernel.org/r/20190429131951.471701-1-arnd@arndb.de

    Arnd Bergmann
     

10 Apr, 2019

2 commits

  • Add lock event counting calls so that we can track the number of lock
    events happening in the rwsem code.

    With CONFIG_LOCK_EVENT_COUNTS on and booting a 4-socket 112-thread x86-64
    system, the rwsem counts after system bootup were as follows:

    rwsem_opt_fail=261
    rwsem_opt_wlock=50636
    rwsem_rlock=445
    rwsem_rlock_fail=0
    rwsem_rlock_fast=22
    rwsem_rtrylock=810144
    rwsem_sleep_reader=441
    rwsem_sleep_writer=310
    rwsem_wake_reader=355
    rwsem_wake_writer=2335
    rwsem_wlock=261
    rwsem_wlock_fail=0
    rwsem_wtrylock=20583

    It can be seen that most of the lock acquisitions in the slowpath were
    write-locks in the optimistic spinning code path with no sleeping at
    all. For this system, over 97% of the locks are acquired via optimistic
    spinning. It illustrates the importance of optimistic spinning in
    improving the performance of rwsem.

    Signed-off-by: Waiman Long
    Acked-by: Peter Zijlstra
    Acked-by: Davidlohr Bueso
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Davidlohr Bueso
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tim Chen
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/20190404174320.22416-11-longman@redhat.com
    Signed-off-by: Ingo Molnar

    Waiman Long
     
  • The QUEUED_LOCK_STAT option to report queued spinlocks event counts
    was previously allowed only on x86 architecture. To make the locking
    event counting code more useful, it is now renamed to a more generic
    LOCK_EVENT_COUNTS config option. This new option will be available to
    all the architectures that use qspinlock at the moment.

    Other locking code can now start to use the generic locking event
    counting code by including lock_events.h and put the new locking event
    names into the lock_events_list.h header file.

    My experience with lock event counting is that it gives valuable insight
    on how the locking code works and what can be done to make it better. I
    would like to extend this benefit to other locking code like mutex and
    rwsem in the near future.

    The PV qspinlock specific code will stay in qspinlock_stat.h. The
    locking event counters will now reside in the /lock_event_counts
    directory.

    Signed-off-by: Waiman Long
    Acked-by: Peter Zijlstra
    Acked-by: Davidlohr Bueso
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Davidlohr Bueso
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Tim Chen
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/20190404174320.22416-9-longman@redhat.com
    Signed-off-by: Ingo Molnar

    Waiman Long
     

03 Apr, 2019

3 commits

  • Add the Kconfig option HAVE_MMU_GATHER_NO_GATHER to the generic
    mmu_gather code. If the option is set the mmu_gather will not
    track individual pages for delayed page free anymore. A platform
    that enables the option needs to provide its own implementation
    of the __tlb_remove_page_size() function to free pages.

    No change in behavior intended.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: aneesh.kumar@linux.vnet.ibm.com
    Cc: heiko.carstens@de.ibm.com
    Cc: linux@armlinux.org.uk
    Cc: npiggin@gmail.com
    Link: http://lkml.kernel.org/r/20180918125151.31744-2-schwidefsky@de.ibm.com
    Signed-off-by: Ingo Molnar

    Martin Schwidefsky
     
  • Make issuing a TLB invalidate for page-table pages the normal case.

    The reason is twofold:

    - too many invalidates is safer than too few,
    - most architectures use the linux page-tables natively
    and would thus require this.

    Make it an opt-out, instead of an opt-in.

    No change in behavior intended.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Move the mmu_gather::page_size things into the generic code instead of
    PowerPC specific bits.

    No change in behavior intended.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Cc: Andrew Morton
    Cc: Andy Lutomirski
    Cc: Aneesh Kumar K.V
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Nick Piggin
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

07 Mar, 2019

1 commit

  • Pull char/misc driver updates from Greg KH:
    "Here is the big char/misc driver patch pull request for 5.1-rc1.

    The largest thing by far is the new habanalabs driver for their AI
    accelerator chip. For now it is in the drivers/misc directory but will
    probably move to a new directory soon along with other drivers of this
    type.

    Other than that, just the usual set of individual driver updates and
    fixes. There's an "odd" merge in here from the DRM tree that they
    asked me to do as the MEI driver is starting to interact with the i915
    driver, and it needed some coordination. All of those patches have
    been properly acked by the relevant subsystem maintainers.

    All of these have been in linux-next with no reported issues, most for
    quite some time"

    * tag 'char-misc-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (219 commits)
    habanalabs: adjust Kconfig to fix build errors
    habanalabs: use %px instead of %p in error print
    habanalabs: use do_div for 64-bit divisions
    intel_th: gth: Fix an off-by-one in output unassigning
    habanalabs: fix little-endiancpu conversion warnings
    habanalabs: use NULL to initialize array of pointers
    habanalabs: fix little-endiancpu conversion warnings
    habanalabs: soft-reset device if context-switch fails
    habanalabs: print pointer using %p
    habanalabs: fix memory leak with CBs with unaligned size
    habanalabs: return correct error code on MMU mapping failure
    habanalabs: add comments in uapi/misc/habanalabs.h
    habanalabs: extend QMAN0 job timeout
    habanalabs: set DMA0 completion to SOB 1007
    habanalabs: fix validation of WREG32 to DMA completion
    habanalabs: fix mmu cache registers init
    habanalabs: disable CPU access on timeouts
    habanalabs: add MMU DRAM default page mapping
    habanalabs: Dissociate RAZWI info from event types
    misc/habanalabs: adjust Kconfig to fix build errors
    ...

    Linus Torvalds
     

06 Mar, 2019

1 commit

  • Pull EFI updates from Ingo Molnar:
    "The main EFI changes in this cycle were:

    - Use 32-bit alignment for efi_guid_t

    - Allow the SetVirtualAddressMap() call to be omitted

    - Implement earlycon=efifb based on existing earlyprintk code

    - Various minor fixes and code cleanups from Sai, Ard and me"

    * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    efi: Fix build error due to enum collision between efi.h and ima.h
    efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation
    x86: Make ARCH_USE_MEMREMAP_PROT a generic Kconfig symbol
    efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted
    efi: Replace GPL license boilerplate with SPDX headers
    efi/fdt: Apply more cleanups
    efi: Use 32-bit alignment for efi_guid_t
    efi/memattr: Don't bail on zero VA if it equals the region's PA
    x86/efi: Mark can_free_region() as an __init function

    Linus Torvalds
     

19 Feb, 2019

1 commit

  • All new 32-bit architectures should have 64-bit userspace off_t type, but
    existing architectures has 32-bit ones.

    To enforce the rule, new config option is added to arch/Kconfig that defaults
    ARCH_32BIT_OFF_T to be disabled for new 32-bit architectures. All existing
    32-bit architectures enable it explicitly.

    New option affects force_o_largefile() behaviour. Namely, if userspace
    off_t is 64-bits long, we have no reason to reject user to open big files.

    Note that even if architectures has only 64-bit off_t in the kernel
    (arc, c6x, h8300, hexagon, nios2, openrisc, and unicore32),
    a libc may use 32-bit off_t, and therefore want to limit the file size
    to 4GB unless specified differently in the open flags.

    Signed-off-by: Yury Norov
    Acked-by: Arnd Bergmann
    Signed-off-by: Yury Norov
    Signed-off-by: Arnd Bergmann

    Yury Norov
     

07 Feb, 2019

1 commit

  • This is the big flip, where all 32-bit architectures set COMPAT_32BIT_TIME
    and use the _time32 system calls from the former compat layer instead
    of the system calls that take __kernel_timespec and similar arguments.

    The temporary redirects for __kernel_timespec, __kernel_itimerspec
    and __kernel_timex can get removed with this.

    It would be easy to split this commit by architecture, but with the new
    generated system call tables, it's easy enough to do it all at once,
    which makes it a little easier to check that the changes are the same
    in each table.

    Acked-by: Geert Uytterhoeven
    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

04 Feb, 2019

1 commit

  • Turn ARCH_USE_MEMREMAP_PROT into a generic Kconfig symbol, and fix the
    dependency expression to reflect that AMD_MEM_ENCRYPT depends on it,
    instead of the other way around. This will permit ARCH_USE_MEMREMAP_PROT
    to be selected by other architectures.

    Note that the encryption related early memremap routines in
    arch/x86/mm/ioremap.c cannot be built for 32-bit x86 without triggering
    the following warning:

    arch/x86//mm/ioremap.c: In function 'early_memremap_encrypted':
    >> arch/x86/include/asm/pgtable_types.h:193:27: warning: conversion from
    'long long unsigned int' to 'long unsigned int' changes
    value from '9223372036854776163' to '355' [-Woverflow]
    #define __PAGE_KERNEL_ENC (__PAGE_KERNEL | _PAGE_ENC)
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    arch/x86//mm/ioremap.c:713:46: note: in expansion of macro '__PAGE_KERNEL_ENC'
    return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_ENC);

    which essentially means they are 64-bit only anyway. However, we cannot
    make them dependent on CONFIG_ARCH_HAS_MEM_ENCRYPT, since that is always
    defined, even for i386 (and changing that results in a slew of build errors)

    So instead, build those routines only if CONFIG_AMD_MEM_ENCRYPT is
    defined.

    Signed-off-by: Ard Biesheuvel
    Cc: AKASHI Takahiro
    Cc: Alexander Graf
    Cc: Bjorn Andersson
    Cc: Borislav Petkov
    Cc: Heinrich Schuchardt
    Cc: Jeffrey Hugo
    Cc: Lee Jones
    Cc: Leif Lindholm
    Cc: Linus Torvalds
    Cc: Matt Fleming
    Cc: Peter Jones
    Cc: Peter Zijlstra
    Cc: Sai Praneeth Prakhya
    Cc: Thomas Gleixner
    Cc: linux-efi@vger.kernel.org
    Link: http://lkml.kernel.org/r/20190202094119.13230-9-ard.biesheuvel@linaro.org
    Signed-off-by: Ingo Molnar

    Ard Biesheuvel
     

22 Jan, 2019

1 commit


06 Jan, 2019

1 commit

  • Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".

    The jump label is controlled by HAVE_JUMP_LABEL, which is defined
    like this:

    #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
    # define HAVE_JUMP_LABEL
    #endif

    We can improve this by testing 'asm goto' support in Kconfig, then
    make JUMP_LABEL depend on CC_HAS_ASM_GOTO.

    Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
    match to the real kernel capability.

    Signed-off-by: Masahiro Yamada
    Acked-by: Michael Ellerman (powerpc)
    Tested-by: Sedat Dilek

    Masahiro Yamada
     

05 Jan, 2019

1 commit

  • Android needs to mremap large regions of memory during memory management
    related operations. The mremap system call can be really slow if THP is
    not enabled. The bottleneck is move_page_tables, which is copying each
    pte at a time, and can be really slow across a large map. Turning on
    THP may not be a viable option, and is not for us. This patch speeds up
    the performance for non-THP system by copying at the PMD level when
    possible.

    The speedup is an order of magnitude on x86 (~20x). On a 1GB mremap,
    the mremap completion times drops from 3.4-3.6 milliseconds to 144-160
    microseconds.

    Before:
    Total mremap time for 1GB data: 3521942 nanoseconds.
    Total mremap time for 1GB data: 3449229 nanoseconds.
    Total mremap time for 1GB data: 3488230 nanoseconds.

    After:
    Total mremap time for 1GB data: 150279 nanoseconds.
    Total mremap time for 1GB data: 144665 nanoseconds.
    Total mremap time for 1GB data: 158708 nanoseconds.

    If THP is enabled the optimization is mostly skipped except in certain
    situations.

    [joel@joelfernandes.org: fix 'move_normal_pmd' unused function warning]
    Link: http://lkml.kernel.org/r/20181108224457.GB209347@google.com
    Link: http://lkml.kernel.org/r/20181108181201.88826-3-joelaf@google.com
    Signed-off-by: Joel Fernandes (Google)
    Acked-by: Kirill A. Shutemov
    Reviewed-by: William Kucharski
    Cc: Julia Lawall
    Cc: Michal Hocko
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joel Fernandes (Google)
     

02 Nov, 2018

1 commit

  • Pull stackleak gcc plugin from Kees Cook:
    "Please pull this new GCC plugin, stackleak, for v4.20-rc1. This plugin
    was ported from grsecurity by Alexander Popov. It provides efficient
    stack content poisoning at syscall exit. This creates a defense
    against at least two classes of flaws:

    - Uninitialized stack usage. (We continue to work on improving the
    compiler to do this in other ways: e.g. unconditional zero init was
    proposed to GCC and Clang, and more plugin work has started too).

    - Stack content exposure. By greatly reducing the lifetime of valid
    stack contents, exposures via either direct read bugs or unknown
    cache side-channels become much more difficult to exploit. This
    complements the existing buddy and heap poisoning options, but
    provides the coverage for stacks.

    The x86 hooks are included in this series (which have been reviewed by
    Ingo, Dave Hansen, and Thomas Gleixner). The arm64 hooks have already
    been merged through the arm64 tree (written by Laura Abbott and
    reviewed by Mark Rutland and Will Deacon).

    With VLAs having been removed this release, there is no need for
    alloca() protection, so it has been removed from the plugin"

    * tag 'stackleak-v4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    arm64: Drop unneeded stackleak_check_alloca()
    stackleak: Allow runtime disabling of kernel stack erasing
    doc: self-protection: Add information about STACKLEAK feature
    fs/proc: Show STACKLEAK metrics in the /proc file system
    lkdtm: Add a test for STACKLEAK
    gcc-plugins: Add STACKLEAK plugin for tracking the kernel stack
    x86/entry: Add STACKLEAK erasing the kernel stack at the end of syscalls

    Linus Torvalds
     

31 Oct, 2018

1 commit

  • Pull tracing updates from Steven Rostedt:
    "The biggest change here is the updates to kprobes

    Back in January I posted patches to create function based events.
    These were the events that you suggested I make to allow developers to
    easily create events in code where no trace event exists. After
    posting those changes for review, it was suggested that we implement
    this instead with kprobes.

    The problem with kprobes is that the interface is too complex and
    needs to be simplified. Masami Hiramatsu posted patches in March and
    I've been playing with them a bit. There's been a bit of clean up in
    the kprobe code that was inspired by the function based event patches,
    and a couple of enhancements to the kprobe event interface.

    - If the arch supports it (we added support for x86), you can place a
    kprobe event at the start of a function and use $arg1, $arg2, etc
    to reference the arguments of a function. (Before you needed to
    know what register or where on the stack the argument was).

    - The second is a way to see array of events. For example, if you
    reference a mac address, you can add:

    echo 'p:mac ip_rcv perm_addr=+574($arg2):x8[6]' > kprobe_events

    And this will produce:

    mac: (ip_rcv+0x0/0x140) perm_addr={0x52,0x54,0x0,0xc0,0x76,0xec}

    Other changes include

    - Exporting trace_dump_stack to modules

    - Have the stack tracer trace the entire stack (stop trying to remove
    tracing itself, as we keep removing too much).

    - Added support for SDT in uprobes"

    [ SDT - "Statically Defined Tracing" are userspace markers for tracing.
    Let's not use random TLA's in explanations unless they are fairly
    well-established as generic (at least for kernel people) - Linus ]

    * tag 'trace-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (24 commits)
    tracing: Have stack tracer trace full stack
    tracing: Export trace_dump_stack to modules
    tracing: probeevent: Fix uninitialized used of offset in parse args
    tracing/kprobes: Allow kprobe-events to record module symbol
    tracing/kprobes: Check the probe on unloaded module correctly
    tracing/uprobes: Fix to return -EFAULT if copy_from_user failed
    tracing: probeevent: Add $argN for accessing function args
    x86: ptrace: Add function argument access API
    tracing: probeevent: Add array type support
    tracing: probeevent: Add symbol type
    tracing: probeevent: Unify fetch_insn processing common part
    tracing: probeevent: Append traceprobe_ for exported function
    tracing: probeevent: Return consumed bytes of dynamic area
    tracing: probeevent: Unify fetch type tables
    tracing: probeevent: Introduce new argument fetching code
    tracing: probeevent: Remove NOKPROBE_SYMBOL from print functions
    tracing: probeevent: Cleanup argument field definition
    tracing: probeevent: Cleanup print argument functions
    trace_uprobe: support reference counter in fd-based uprobe
    perf probe: Support SDT markers having reference counter (semaphore)
    ...

    Linus Torvalds
     

11 Oct, 2018

1 commit

  • Add regs_get_argument() which returns N th argument of the
    function call.
    Note that this chooses most probably assignment, in some case
    it can be incorrect (e.g. passing data structure or floating
    point etc.)

    This is expected to be called from kprobes or ftrace with regs
    where the top of stack is the return address.

    Link: http://lkml.kernel.org/r/152465885737.26224.2822487520472783854.stgit@devbox

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt (VMware)

    Masami Hiramatsu
     

27 Sep, 2018

1 commit

  • To reduce the size taken up by absolute references in jump label
    entries themselves and the associated relocation records in the
    .init segment, add support for emitting them as relative references
    instead.

    Note that this requires some extra care in the sorting routine, given
    that the offsets change when entries are moved around in the jump_entry
    table.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Thomas Gleixner
    Acked-by: Peter Zijlstra (Intel)
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-s390@vger.kernel.org
    Cc: Arnd Bergmann
    Cc: Heiko Carstens
    Cc: Kees Cook
    Cc: Will Deacon
    Cc: Catalin Marinas
    Cc: Steven Rostedt
    Cc: Martin Schwidefsky
    Cc: Jessica Yu
    Link: https://lkml.kernel.org/r/20180919065144.25010-3-ard.biesheuvel@linaro.org

    Ard Biesheuvel
     

05 Sep, 2018

1 commit

  • The STACKLEAK feature (initially developed by PaX Team) has the following
    benefits:

    1. Reduces the information that can be revealed through kernel stack leak
    bugs. The idea of erasing the thread stack at the end of syscalls is
    similar to CONFIG_PAGE_POISONING and memzero_explicit() in kernel
    crypto, which all comply with FDP_RIP.2 (Full Residual Information
    Protection) of the Common Criteria standard.

    2. Blocks some uninitialized stack variable attacks (e.g. CVE-2017-17712,
    CVE-2010-2963). That kind of bugs should be killed by improving C
    compilers in future, which might take a long time.

    This commit introduces the code filling the used part of the kernel
    stack with a poison value before returning to userspace. Full
    STACKLEAK feature also contains the gcc plugin which comes in a
    separate commit.

    The STACKLEAK feature is ported from grsecurity/PaX. More information at:
    https://grsecurity.net/
    https://pax.grsecurity.net/

    This code is modified from Brad Spengler/PaX Team's code in the last
    public patch of grsecurity/PaX based on our understanding of the code.
    Changes or omissions from the original code are ours and don't reflect
    the original grsecurity/PaX code.

    Performance impact:

    Hardware: Intel Core i7-4770, 16 GB RAM

    Test #1: building the Linux kernel on a single core
    0.91% slowdown

    Test #2: hackbench -s 4096 -l 2000 -g 15 -f 25 -P
    4.2% slowdown

    So the STACKLEAK description in Kconfig includes: "The tradeoff is the
    performance impact: on a single CPU system kernel compilation sees a 1%
    slowdown, other systems and workloads may vary and you are advised to
    test this feature on your expected workload before deploying it".

    Signed-off-by: Alexander Popov
    Acked-by: Thomas Gleixner
    Reviewed-by: Dave Hansen
    Acked-by: Ingo Molnar
    Signed-off-by: Kees Cook

    Alexander Popov
     

24 Aug, 2018

3 commits

  • Merge fixes for missing TLB shootdowns.

    This fixes a couple of cases that involved us possibly freeing page
    table structures before the required TLB shootdown had been done.

    There are a few cleanup patches to make the code easier to follow, and
    to avoid some of the more problematic cases entirely when not necessary.

    To make this easier for backports, it undoes the recent lazy TLB
    patches, because the cleanups and fixes are more important, and Rik is
    ok with re-doing them later when things have calmed down.

    The missing TLB flush was only delayed, and the wrong ordering only
    happened under memory pressure (and in theory under a couple of other
    fairly theoretical situations), so this may have been all very unlikely
    to have hit people in practice.

    But getting the TLB shootdown wrong is _so_ hard to debug and see that I
    consider this a crticial fix.

    Many thanks to Jann Horn for having debugged this.

    * tlb-fixes:
    x86/mm: Only use tlb_remove_table() for paravirt
    mm: mmu_notifier fix for tlb_end_vma
    mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE
    mm/tlb: Remove tlb_remove_table() non-concurrent condition
    mm: move tlb_table_flush to tlb_flush_mmu_free
    x86/mm/tlb: Revert the recent lazy TLB patches

    Linus Torvalds
     
  • Pull MIPS fixes from Paul Burton:

    - Fix microMIPS build failures by adding a .insn directive to the
    barrier_before_unreachable() asm statement in order to convince the
    toolchain that the asm statement is a valid branch target rather
    than a bogus attempt to switch ISA.

    - Clean up our declarations of TLB functions that we overwrite with
    generated code in order to prevent the compiler making assumptions
    about alignment that cause microMIPS kernels built with GCC 7 &
    above to die early during boot.

    - Fix up a regression for MIPS32 kernels which slipped into the main
    MIPS pull for 4.19, causing CONFIG_32BIT=y kernels to contain
    inappropriate MIPS64 instructions.

    - Extend our existing workaround for MIPSr6 builds that end up using
    the __multi3 intrinsic to GCC 7 & below, rather than just GCC 7.

    * tag 'mips_4.19_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
    MIPS: lib: Provide MIPS64r6 __multi3() for GCC < 7
    MIPS: Workaround GCC __builtin_unreachable reordering bug
    compiler.h: Allow arch-specific asm/compiler.h
    MIPS: Avoid move psuedo-instruction whilst using MIPS_ISA_LEVEL
    MIPS: Consistently declare TLB functions
    MIPS: Export tlbmiss_handler_setup_pgd near its definition

    Linus Torvalds
     
  • Jann reported that x86 was missing required TLB invalidates when he
    hit the !*batch slow path in tlb_remove_table().

    This is indeed the case; RCU_TABLE_FREE does not provide TLB (cache)
    invalidates, the PowerPC-hash where this code originated and the
    Sparc-hash where this was subsequently used did not need that. ARM
    which later used this put an explicit TLB invalidate in their
    __p*_free_tlb() functions, and PowerPC-radix followed that example.

    But when we hooked up x86 we failed to consider this. Fix this by
    (optionally) hooking tlb_remove_table() into the TLB invalidate code.

    NOTE: s390 was also needing something like this and might now
    be able to use the generic code again.

    [ Modified to be on top of Nick's cleanups, which simplified this patch
    now that tlb_flush_mmu_tlbonly() really only flushes the TLB - Linus ]

    Fixes: 9e52fc2b50de ("x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)")
    Reported-by: Jann Horn
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Rik van Riel
    Cc: Nicholas Piggin
    Cc: David Miller
    Cc: Will Deacon
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

23 Aug, 2018

1 commit

  • Patch series "add support for relative references in special sections", v10.

    This adds support for emitting special sections such as initcall arrays,
    PCI fixups and tracepoints as relative references rather than absolute
    references. This reduces the size by 50% on 64-bit architectures, but
    more importantly, it removes the need for carrying relocation metadata for
    these sections in relocatable kernels (e.g., for KASLR) that needs to be
    fixed up at boot time. On arm64, this reduces the vmlinux footprint of
    such a reference by 8x (8 byte absolute reference + 24 byte RELA entry vs
    4 byte relative reference)

    Patch #3 was sent out before as a single patch. This series supersedes
    the previous submission. This version makes relative ksymtab entries
    dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
    than trying to infer from kbuild test robot replies for which
    architectures it should be blacklisted.

    Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
    and sets it for the main architectures that are expected to benefit the
    most from this feature, i.e., 64-bit architectures or ones that use
    runtime relocations.

    Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of
    ksymtab/kcrctab sections in decompressor and EFI stub objects when
    rebuilding existing C files to run in a different context.

    Patches #4 - #6 implement relative references for initcalls, PCI fixups
    and tracepoints, respectively, all of which produce sections with order
    ~1000 entries on an arm64 defconfig kernel with tracing enabled. This
    means we save about 28 KB of vmlinux space for each of these patches.

    [From the v7 series blurb, which included the jump_label patches as well]:

    For the arm64 kernel, all patches combined reduce the memory footprint
    of vmlinux by about 1.3 MB (using a config copied from Ubuntu that has
    KASLR enabled), of which ~1 MB is the size reduction of the RELA section
    in .init, and the remaining 300 KB is reduction of .text/.data.

    This patch (of 6):

    Before updating certain subsystems to use place relative 32-bit
    relocations in special sections, to save space and reduce the number of
    absolute relocations that need to be processed at runtime by relocatable
    kernels, introduce the Kconfig symbol and define it for some architectures
    that should be able to support and benefit from it.

    Link: http://lkml.kernel.org/r/20180704083651.24360-2-ard.biesheuvel@linaro.org
    Signed-off-by: Ard Biesheuvel
    Acked-by: Michael Ellerman
    Reviewed-by: Will Deacon
    Acked-by: Ingo Molnar
    Cc: Arnd Bergmann
    Cc: Kees Cook
    Cc: Thomas Garnier
    Cc: Thomas Gleixner
    Cc: "Serge E. Hallyn"
    Cc: Bjorn Helgaas
    Cc: Benjamin Herrenschmidt
    Cc: Russell King
    Cc: Paul Mackerras
    Cc: Catalin Marinas
    Cc: Petr Mladek
    Cc: James Morris
    Cc: Nicolas Pitre
    Cc: Josh Poimboeuf
    Cc: Steven Rostedt
    Cc: Sergey Senozhatsky ,
    Cc: James Morris
    Cc: Jessica Yu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ard Biesheuvel
     

22 Aug, 2018

1 commit

  • We have a need to override the definition of
    barrier_before_unreachable() for MIPS, which means we either need to add
    architecture-specific code into linux/compiler-gcc.h or we need to allow
    the architecture to provide a header that can define the macro before
    the generic definition. The latter seems like the better approach.

    A straightforward approach to the per-arch header is to make use of
    asm-generic to provide a default empty header & adjust architectures
    which don't need anything specific to make use of that by adding the
    header to generic-y. Unfortunately this doesn't work so well due to
    commit 28128c61e08e ("kconfig.h: Include compiler types to avoid missed
    struct attributes") which caused linux/compiler_types.h to be included
    in the compilation of every C file via the -include linux/kconfig.h flag
    in c_flags.

    Because the -include flag is present for all C files we compile, we need
    the architecture-provided header to be present before any C files are
    compiled. If any C files can be compiled prior to the asm-generic header
    wrappers being generated then we hit a build failure due to missing
    header. Such cases do exist - one pointed out by the kbuild test robot
    is the compilation of arch/ia64/kernel/nr-irqs.c, which occurs as part
    of the archprepare target [1].

    This leaves us with a few options:

    1) Use generic-y & fix any build failures we find by enforcing
    ordering such that the asm-generic target occurs before any C
    compilation, such that linux/compiler_types.h can always include
    the generated asm-generic wrapper which in turn includes the empty
    asm-generic header. This would rely on us finding all the
    problematic cases - I don't know for sure that the ia64 issue is
    the only one.

    2) Add an actual empty header to each architecture, so that we don't
    need the generated asm-generic wrapper. This seems messy.

    3) Give up & add #ifdef CONFIG_MIPS or similar to
    linux/compiler_types.h. This seems messy too.

    4) Include the arch header only when it's actually needed, removing
    the need for the asm-generic wrapper for all other architectures.

    This patch allows us to use approach 4, by including an asm/compiler.h
    header from linux/compiler_types.h after the inclusion of the
    compiler-specific linux/compiler-*.h header(s). We do this
    conditionally, only when CONFIG_HAVE_ARCH_COMPILER_H is selected, in
    order to avoid the need for asm-generic wrappers & the associated build
    ordering issue described above. The asm/compiler.h header is included
    after the generic linux/compiler-*.h header(s) for consistency with the
    way linux/compiler-intel.h & linux/compiler-clang.h are included after
    the linux/compiler-gcc.h header that they override.

    [1] https://lists.01.org/pipermail/kbuild-all/2018-August/051175.html

    Signed-off-by: Paul Burton
    Reviewed-by: Masahiro Yamada
    Patchwork: https://patchwork.linux-mips.org/patch/20269/
    Cc: Arnd Bergmann
    Cc: James Hogan
    Cc: Masahiro Yamada
    Cc: Ralf Baechle
    Cc: linux-arch@vger.kernel.org
    Cc: linux-kbuild@vger.kernel.org
    Cc: linux-mips@linux-mips.org

    Paul Burton
     

16 Aug, 2018

2 commits

  • Pull Kconfig consolidation from Masahiro Yamada:
    "Consolidation of Kconfig files by Christoph Hellwig.

    Move the source statements of arch-independent Kconfig files instead
    of duplicating the includes in every arch/$(SRCARCH)/Kconfig"

    * tag 'kconfig-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
    kconfig: add a Memory Management options" menu
    kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt
    kconfig: use a menu in arch/Kconfig to reduce clutter
    kconfig: include kernel/Kconfig.preempt from init/Kconfig
    Kconfig: consolidate the "Kernel hacking" menu
    kconfig: include common Kconfig files from top-level Kconfig
    kconfig: remove duplicate SWAP symbol defintions
    um: create a proper drivers Kconfig
    um: cleanup Kconfig files
    um: stop abusing KBUILD_KCONFIG

    Linus Torvalds
     
  • Pull gcc plugin cleanups from Kees Cook:

    - Kconfig and Makefile clean-ups (Masahiro Yamada, Kees Cook)

    - gcc-common.h definition clean-ups (Alexander Popov)

    * tag 'gcc-plugin-cleanup-v4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    gcc-plugins: Clean up the cgraph_create_edge* macros
    gcc-plugins: Regularize Makefile.gcc-plugins
    gcc-plugins: split out Kconfig entries to scripts/gcc-plugins/Kconfig
    gcc-plugins: remove unused GCC_PLUGIN_SUBDIR

    Linus Torvalds
     

02 Aug, 2018

3 commits


25 Jul, 2018

1 commit


21 Jun, 2018

1 commit

  • Provide a command line and a sysfs knob to control SMT.

    The command line options are:

    'nosmt': Enumerate secondary threads, but do not online them

    'nosmt=force': Ignore secondary threads completely during enumeration
    via MP table and ACPI/MADT.

    The sysfs control file has the following states (read/write):

    'on': SMT is enabled. Secondary threads can be freely onlined
    'off': SMT is disabled. Secondary threads, even if enumerated
    cannot be onlined
    'forceoff': SMT is permanentely disabled. Writes to the control
    file are rejected.
    'notsupported': SMT is not supported by the CPU

    The command line option 'nosmt' sets the sysfs control to 'off'. This
    can be changed to 'on' to reenable SMT during runtime.

    The command line option 'nosmt=force' sets the sysfs control to
    'forceoff'. This cannot be changed during runtime.

    When SMT is 'on' and the control file is changed to 'off' then all online
    secondary threads are offlined and attempts to online a secondary thread
    later on are rejected.

    When SMT is 'off' and the control file is changed to 'on' then secondary
    threads can be onlined again. The 'off' -> 'on' transition does not
    automatically online the secondary threads.

    When the control file is set to 'forceoff', the behaviour is the same as
    setting it to 'off', but the operation is irreversible and later writes to
    the control file are rejected.

    When the control status is 'notsupported' then writes to the control file
    are rejected.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Konrad Rzeszutek Wilk
    Acked-by: Ingo Molnar

    Thomas Gleixner
     

16 Jun, 2018

1 commit

  • As we move stuff around, some doc references are broken. Fix some of
    them via this script:
    ./scripts/documentation-file-ref-check --fix

    Manually checked if the produced result is valid, removing a few
    false-positives.

    Acked-by: Takashi Iwai
    Acked-by: Masami Hiramatsu
    Acked-by: Stephen Boyd
    Acked-by: Charles Keepax
    Acked-by: Mathieu Poirier
    Reviewed-by: Coly Li
    Signed-off-by: Mauro Carvalho Chehab
    Acked-by: Jonathan Corbet

    Mauro Carvalho Chehab
     

15 Jun, 2018

1 commit

  • HAVE_CC_STACKPROTECTOR should be selected by architectures with stack
    canary implementation. It is not about the compiler support.

    For the consistency with commit 050e9baa9dc9 ("Kbuild: rename
    CC_STACKPROTECTOR[_STRONG] config variables"), remove 'CC_' from the
    config symbol.

    I moved the 'select' lines to keep the alphabetical sorting.

    Signed-off-by: Masahiro Yamada
    Acked-by: Kees Cook
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     

14 Jun, 2018

1 commit

  • The changes to automatically test for working stack protector compiler
    support in the Kconfig files removed the special STACKPROTECTOR_AUTO
    option that picked the strongest stack protector that the compiler
    supported.

    That was all a nice cleanup - it makes no sense to have the AUTO case
    now that the Kconfig phase can just determine the compiler support
    directly.

    HOWEVER.

    It also meant that doing "make oldconfig" would now _disable_ the strong
    stackprotector if you had AUTO enabled, because in a legacy config file,
    the sane stack protector configuration would look like

    CONFIG_HAVE_CC_STACKPROTECTOR=y
    # CONFIG_CC_STACKPROTECTOR_NONE is not set
    # CONFIG_CC_STACKPROTECTOR_REGULAR is not set
    # CONFIG_CC_STACKPROTECTOR_STRONG is not set
    CONFIG_CC_STACKPROTECTOR_AUTO=y

    and when you ran this through "make oldconfig" with the Kbuild changes,
    it would ask you about the regular CONFIG_CC_STACKPROTECTOR (that had
    been renamed from CONFIG_CC_STACKPROTECTOR_REGULAR to just
    CONFIG_CC_STACKPROTECTOR), but it would think that the STRONG version
    used to be disabled (because it was really enabled by AUTO), and would
    disable it in the new config, resulting in:

    CONFIG_HAVE_CC_STACKPROTECTOR=y
    CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
    CONFIG_CC_STACKPROTECTOR=y
    # CONFIG_CC_STACKPROTECTOR_STRONG is not set
    CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

    That's dangerously subtle - people could suddenly find themselves with
    the weaker stack protector setup without even realizing.

    The solution here is to just rename not just the old RECULAR stack
    protector option, but also the strong one. This does that by just
    removing the CC_ prefix entirely for the user choices, because it really
    is not about the compiler support (the compiler support now instead
    automatially impacts _visibility_ of the options to users).

    This results in "make oldconfig" actually asking the user for their
    choice, so that we don't have any silent subtle security model changes.
    The end result would generally look like this:

    CONFIG_HAVE_CC_STACKPROTECTOR=y
    CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
    CONFIG_STACKPROTECTOR=y
    CONFIG_STACKPROTECTOR_STRONG=y
    CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

    where the "CC_" versions really are about internal compiler
    infrastructure, not the user selections.

    Acked-by: Masahiro Yamada
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

13 Jun, 2018

1 commit

  • Pull more Kbuild updates from Masahiro Yamada:

    - fix some bugs introduced by the recent Kconfig syntax extension

    - add some symbols about compiler information in Kconfig, such as
    CC_IS_GCC, CC_IS_CLANG, GCC_VERSION, etc.

    - test compiler capability for the stack protector in Kconfig, and
    clean-up Makefile

    - test compiler capability for GCC-plugins in Kconfig, and clean-up
    Makefile

    - allow to enable GCC-plugins for COMPILE_TEST

    - test compiler capability for KCOV in Kconfig and correct dependency

    - remove auto-detect mode of the GCOV format, which is now more nicely
    handled in Kconfig

    - test compiler capability for mprofile-kernel on PowerPC, and clean-up
    Makefile

    - misc cleanups

    * tag 'kbuild-v4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
    linux/linkage.h: replace VMLINUX_SYMBOL_STR() with __stringify()
    kconfig: fix localmodconfig
    sh: remove no-op macro VMLINUX_SYMBOL()
    powerpc/kbuild: move -mprofile-kernel check to Kconfig
    Documentation: kconfig: add recommended way to describe compiler support
    gcc-plugins: disable GCC_PLUGIN_STRUCTLEAK_BYREF_ALL for COMPILE_TEST
    gcc-plugins: allow to enable GCC_PLUGINS for COMPILE_TEST
    gcc-plugins: test plugin support in Kconfig and clean up Makefile
    gcc-plugins: move GCC version check for PowerPC to Kconfig
    kcov: test compiler capability in Kconfig and correct dependency
    gcov: remove CONFIG_GCOV_FORMAT_AUTODETECT
    arm64: move GCC version check for ARCH_SUPPORTS_INT128 to Kconfig
    kconfig: add CC_IS_CLANG and CLANG_VERSION
    kconfig: add CC_IS_GCC and GCC_VERSION
    stack-protector: test compiler capability in Kconfig and drop AUTO mode
    kbuild: fix endless syncconfig in case arch Makefile sets CROSS_COMPILE

    Linus Torvalds