25 Jan, 2021

1 commit


17 Jan, 2021

1 commit

  • commit 2ca408d9c749c32288bc28725f9f12ba30299e8f upstream.

    Commit

    121b32a58a3a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments")

    converted native x86-32 which take 64-bit arguments to use the
    compat handlers to allow conversion to passing args via pt_regs.
    sys_fanotify_mark() was however missed, as it has a general compat
    handler. Add a config option that will use the syscall wrapper that
    takes the split args for native 32-bit.

    [ bp: Fix typo in Kconfig help text. ]

    Fixes: 121b32a58a3a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments")
    Reported-by: Paweł Jasiak
    Signed-off-by: Brian Gerst
    Signed-off-by: Borislav Petkov
    Acked-by: Jan Kara
    Acked-by: Andy Lutomirski
    Link: https://lkml.kernel.org/r/20201130223059.101286-1-brgerst@gmail.com
    Signed-off-by: Greg Kroah-Hartman

    Brian Gerst
     

17 Dec, 2020

2 commits


01 Dec, 2020

1 commit

  • Currently, '--orphan-handling=warn' is spread out across four different
    architectures in their respective Makefiles, which makes it a little
    unruly to deal with in case it needs to be disabled for a specific
    linker version (in this case, ld.lld 10.0.1).

    To make it easier to control this, hoist this warning into Kconfig and
    the main Makefile so that disabling it is simpler, as the warning will
    only be enabled in a couple places (main Makefile and a couple of
    compressed boot folders that blow away LDFLAGS_vmlinx) and making it
    conditional is easier due to Kconfig syntax. One small additional
    benefit of this is saving a call to ld-option on incremental builds
    because we will have already evaluated it for CONFIG_LD_ORPHAN_WARN.

    To keep the list of supported architectures the same, introduce
    CONFIG_ARCH_WANT_LD_ORPHAN_WARN, which an architecture can select to
    gain this automatically after all of the sections are specified and size
    asserted. A special thanks to Kees Cook for the help text on this
    config.

    Link: https://github.com/ClangBuiltLinux/linux/issues/1187
    Acked-by: Kees Cook
    Acked-by: Michael Ellerman (powerpc)
    Reviewed-by: Nick Desaulniers
    Tested-by: Nick Desaulniers
    Signed-off-by: Nathan Chancellor
    Signed-off-by: Masahiro Yamada

    Nathan Chancellor
     

15 Oct, 2020

1 commit

  • Pull x86 SEV-ES support from Borislav Petkov:
    "SEV-ES enhances the current guest memory encryption support called SEV
    by also encrypting the guest register state, making the registers
    inaccessible to the hypervisor by en-/decrypting them on world
    switches. Thus, it adds additional protection to Linux guests against
    exfiltration, control flow and rollback attacks.

    With SEV-ES, the guest is in full control of what registers the
    hypervisor can access. This is provided by a guest-host exchange
    mechanism based on a new exception vector called VMM Communication
    Exception (#VC), a new instruction called VMGEXIT and a shared
    Guest-Host Communication Block which is a decrypted page shared
    between the guest and the hypervisor.

    Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest
    so in order for that exception mechanism to work, the early x86 init
    code needed to be made able to handle exceptions, which, in itself,
    brings a bunch of very nice cleanups and improvements to the early
    boot code like an early page fault handler, allowing for on-demand
    building of the identity mapping. With that, !KASLR configurations do
    not use the EFI page table anymore but switch to a kernel-controlled
    one.

    The main part of this series adds the support for that new exchange
    mechanism. The goal has been to keep this as much as possibly separate
    from the core x86 code by concentrating the machinery in two
    SEV-ES-specific files:

    arch/x86/kernel/sev-es-shared.c
    arch/x86/kernel/sev-es.c

    Other interaction with core x86 code has been kept at minimum and
    behind static keys to minimize the performance impact on !SEV-ES
    setups.

    Work by Joerg Roedel and Thomas Lendacky and others"

    * tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits)
    x86/sev-es: Use GHCB accessor for setting the MMIO scratch buffer
    x86/sev-es: Check required CPU features for SEV-ES
    x86/efi: Add GHCB mappings when SEV-ES is active
    x86/sev-es: Handle NMI State
    x86/sev-es: Support CPU offline/online
    x86/head/64: Don't call verify_cpu() on starting APs
    x86/smpboot: Load TSS and getcpu GDT entry before loading IDT
    x86/realmode: Setup AP jump table
    x86/realmode: Add SEV-ES specific trampoline entry point
    x86/vmware: Add VMware-specific handling for VMMCALL under SEV-ES
    x86/kvm: Add KVM-specific VMMCALL handling under SEV-ES
    x86/paravirt: Allow hypervisor-specific VMMCALL handling under SEV-ES
    x86/sev-es: Handle #DB Events
    x86/sev-es: Handle #AC Events
    x86/sev-es: Handle VMMCALL Events
    x86/sev-es: Handle MWAIT/MWAITX Events
    x86/sev-es: Handle MONITOR/MONITORX Events
    x86/sev-es: Handle INVD Events
    x86/sev-es: Handle RDPMC Events
    x86/sev-es: Handle RDTSC(P) Events
    ...

    Linus Torvalds
     

14 Oct, 2020

1 commit

  • Pull seccomp updates from Kees Cook:
    "The bulk of the changes are with the seccomp selftests to accommodate
    some powerpc-specific behavioral characteristics. Additional cleanups,
    fixes, and improvements are also included:

    - heavily refactor seccomp selftests (and clone3 selftests
    dependency) to fix powerpc (Kees Cook, Thadeu Lima de Souza
    Cascardo)

    - fix style issue in selftests (Zou Wei)

    - upgrade "unknown action" from KILL_THREAD to KILL_PROCESS (Rich
    Felker)

    - replace task_pt_regs(current) with current_pt_regs() (Denis
    Efremov)

    - fix corner-case race in USER_NOTIF (Jann Horn)

    - make CONFIG_SECCOMP no longer per-arch (YiFei Zhu)"

    * tag 'seccomp-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits)
    seccomp: Make duplicate listener detection non-racy
    seccomp: Move config option SECCOMP to arch/Kconfig
    selftests/clone3: Avoid OS-defined clone_args
    selftests/seccomp: powerpc: Set syscall return during ptrace syscall exit
    selftests/seccomp: Allow syscall nr and ret value to be set separately
    selftests/seccomp: Record syscall during ptrace entry
    selftests/seccomp: powerpc: Fix seccomp return value testing
    selftests/seccomp: Remove SYSCALL_NUM_RET_SHARE_REG in favor of SYSCALL_RET_SET
    selftests/seccomp: Avoid redundant register flushes
    selftests/seccomp: Convert REGSET calls into ARCH_GETREG/ARCH_SETREG
    selftests/seccomp: Convert HAVE_GETREG into ARCH_GETREG/ARCH_SETREG
    selftests/seccomp: Remove syscall setting #ifdefs
    selftests/seccomp: mips: Remove O32-specific macro
    selftests/seccomp: arm64: Define SYSCALL_NUM_SET macro
    selftests/seccomp: arm: Define SYSCALL_NUM_SET macro
    selftests/seccomp: mips: Define SYSCALL_NUM_SET macro
    selftests/seccomp: Provide generic syscall setting macro
    selftests/seccomp: Refactor arch register macros to avoid xtensa special case
    selftests/seccomp: Use __NR_mknodat instead of __NR_mknod
    selftests/seccomp: Use bitwise instead of arithmetic operator for flags
    ...

    Linus Torvalds
     

13 Oct, 2020

1 commit

  • Pull static call support from Ingo Molnar:
    "This introduces static_call(), which is the idea of static_branch()
    applied to indirect function calls. Remove a data load (indirection)
    by modifying the text.

    They give the flexibility of function pointers, but with better
    performance. (This is especially important for cases where retpolines
    would otherwise be used, as retpolines can be pretty slow.)

    API overview:

    DECLARE_STATIC_CALL(name, func);
    DEFINE_STATIC_CALL(name, func);
    DEFINE_STATIC_CALL_NULL(name, typename);

    static_call(name)(args...);
    static_call_cond(name)(args...);
    static_call_update(name, func);

    x86 is supported via text patching, otherwise basic indirect calls are
    used, with function pointers.

    There's a second variant using inline code patching, inspired by
    jump-labels, implemented on x86 as well.

    The new APIs are utilized in the x86 perf code, a heavy user of
    function pointers, where static calls speed up the PMU handler by
    4.2% (!).

    The generic implementation is not really excercised on other
    architectures, outside of the trivial test_static_call_init()
    self-test"

    * tag 'core-static_call-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
    static_call: Fix return type of static_call_init
    tracepoint: Fix out of sync data passing by static caller
    tracepoint: Fix overly long tracepoint names
    x86/perf, static_call: Optimize x86_pmu methods
    tracepoint: Optimize using static_call()
    static_call: Allow early init
    static_call: Add some validation
    static_call: Handle tail-calls
    static_call: Add static_call_cond()
    x86/alternatives: Teach text_poke_bp() to emulate RET
    static_call: Add simple self-test for static calls
    x86/static_call: Add inline static call implementation for x86-64
    x86/static_call: Add out-of-line static call implementation
    static_call: Avoid kprobes on inline static_call()s
    static_call: Add inline static call infrastructure
    static_call: Add basic static call infrastructure
    compiler.h: Make __ADDRESSABLE() symbol truly unique
    jump_label,module: Fix module lifetime for __jump_label_mod_text_reserved()
    module: Properly propagate MODULE_STATE_COMING failure
    module: Fix up module_notifier return values
    ...

    Linus Torvalds
     

09 Oct, 2020

1 commit

  • In order to make adding configurable features into seccomp easier,
    it's better to have the options at one single location, considering
    especially that the bulk of seccomp code is arch-independent. An quick
    look also show that many SECCOMP descriptions are outdated; they talk
    about /proc rather than prctl.

    As a result of moving the config option and keeping it default on,
    architectures arm, arm64, csky, riscv, sh, and xtensa did not have SECCOMP
    on by default prior to this and SECCOMP will be default in this change.

    Architectures microblaze, mips, powerpc, s390, sh, and sparc have an
    outdated depend on PROC_FS and this dependency is removed in this change.

    Suggested-by: Jann Horn
    Link: https://lore.kernel.org/lkml/CAG48ez1YWz9cnp08UZgeieYRhHdqh-ch7aNwc4JRBnGyrmgfMg@mail.gmail.com/
    Signed-off-by: YiFei Zhu
    [kees: added HAVE_ARCH_SECCOMP help text, tweaked wording]
    Signed-off-by: Kees Cook
    Link: https://lore.kernel.org/r/9ede6ef35c847e58d61e476c6a39540520066613.1600951211.git.yifeifz2@illinois.edu

    YiFei Zhu
     

06 Oct, 2020

1 commit

  • In reaction to a proposal to introduce a memcpy_mcsafe_fast()
    implementation Linus points out that memcpy_mcsafe() is poorly named
    relative to communicating the scope of the interface. Specifically what
    addresses are valid to pass as source, destination, and what faults /
    exceptions are handled.

    Of particular concern is that even though x86 might be able to handle
    the semantics of copy_mc_to_user() with its common copy_user_generic()
    implementation other archs likely need / want an explicit path for this
    case:

    On Fri, May 1, 2020 at 11:28 AM Linus Torvalds wrote:
    >
    > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams wrote:
    > >
    > > However now I see that copy_user_generic() works for the wrong reason.
    > > It works because the exception on the source address due to poison
    > > looks no different than a write fault on the user address to the
    > > caller, it's still just a short copy. So it makes copy_to_user() work
    > > for the wrong reason relative to the name.
    >
    > Right.
    >
    > And it won't work that way on other architectures. On x86, we have a
    > generic function that can take faults on either side, and we use it
    > for both cases (and for the "in_user" case too), but that's an
    > artifact of the architecture oddity.
    >
    > In fact, it's probably wrong even on x86 - because it can hide bugs -
    > but writing those things is painful enough that everybody prefers
    > having just one function.

    Replace a single top-level memcpy_mcsafe() with either
    copy_mc_to_user(), or copy_mc_to_kernel().

    Introduce an x86 copy_mc_fragile() name as the rename for the
    low-level x86 implementation formerly named memcpy_mcsafe(). It is used
    as the slow / careful backend that is supplanted by a fast
    copy_mc_generic() in a follow-on patch.

    One side-effect of this reorganization is that separating copy_mc_64.S
    to its own file means that perf no longer needs to track dependencies
    for its memcpy_64.S benchmarks.

    [ bp: Massage a bit. ]

    Signed-off-by: Dan Williams
    Signed-off-by: Borislav Petkov
    Reviewed-by: Tony Luck
    Acked-by: Michael Ellerman
    Cc:
    Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
    Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com

    Dan Williams
     

08 Sep, 2020

1 commit


01 Sep, 2020

2 commits

  • Add the inline static call implementation for x86-64. The generated code
    is identical to the out-of-line case, except we move the trampoline into
    it's own section.

    Objtool uses the trampoline naming convention to detect all the call
    sites. It then annotates those call sites in the .static_call_sites
    section.

    During boot (and module init), the call sites are patched to call
    directly into the destination function. The temporary trampoline is
    then no longer used.

    [peterz: merged trampolines, put trampoline in section]

    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Cc: Linus Torvalds
    Link: https://lore.kernel.org/r/20200818135804.864271425@infradead.org

    Josh Poimboeuf
     
  • Add the x86 out-of-line static call implementation. For each key, a
    permanent trampoline is created which is the destination for all static
    calls for the given key. The trampoline has a direct jump which gets
    patched by static_call_update() when the destination function changes.

    [peterz: fixed trampoline, rewrote patching code]

    Signed-off-by: Josh Poimboeuf
    Signed-off-by: Peter Zijlstra (Intel)
    Signed-off-by: Ingo Molnar
    Cc: Linus Torvalds
    Link: https://lore.kernel.org/r/20200818135804.804315175@infradead.org

    Josh Poimboeuf
     

15 Aug, 2020

1 commit

  • Pull more timer updates from Thomas Gleixner:
    "A set of posix CPU timer changes which allows to defer the heavy work
    of posix CPU timers into task work context. The tick interrupt is
    reduced to a quick check which queues the work which is doing the
    heavy lifting before returning to user space or going back to guest
    mode. Moving this out is deferring the signal delivery slightly but
    posix CPU timers are inaccurate by nature as they depend on the tick
    so there is no real damage. The relevant test cases all passed.

    This lifts the last offender for RT out of the hard interrupt context
    tick handler, but it also has the general benefit that the actual
    heavy work is accounted to the task/process and not to the tick
    interrupt itself.

    Further optimizations are possible to break long sighand lock hold and
    interrupt disabled (on !RT kernels) times when a massive amount of
    posix CPU timers (which are unpriviledged) is armed for a
    task/process.

    This is currently only enabled for x86 because the architecture has to
    ensure that task work is handled in KVM before entering a guest, which
    was just established for x86 with the new common entry/exit code which
    got merged post 5.8 and is not the case for other KVM architectures"

    * tag 'timers-core-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86: Select POSIX_CPU_TIMERS_TASK_WORK
    posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
    posix-cpu-timers: Split run_posix_cpu_timers()

    Linus Torvalds
     

07 Aug, 2020

1 commit

  • Pull KVM updates from Paolo Bonzini:
    "s390:
    - implement diag318

    x86:
    - Report last CPU for debugging
    - Emulate smaller MAXPHYADDR in the guest than in the host
    - .noinstr and tracing fixes from Thomas
    - nested SVM page table switching optimization and fixes

    Generic:
    - Unify shadow MMU cache data structures across architectures"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
    KVM: SVM: Fix sev_pin_memory() error handling
    KVM: LAPIC: Set the TDCR settable bits
    KVM: x86: Specify max TDP level via kvm_configure_mmu()
    KVM: x86/mmu: Rename max_page_level to max_huge_page_level
    KVM: x86: Dynamically calculate TDP level from max level and MAXPHYADDR
    KVM: VXM: Remove temporary WARN on expected vs. actual EPTP level mismatch
    KVM: x86: Pull the PGD's level from the MMU instead of recalculating it
    KVM: VMX: Make vmx_load_mmu_pgd() static
    KVM: x86/mmu: Add separate helper for shadow NPT root page role calc
    KVM: VMX: Drop a duplicate declaration of construct_eptp()
    KVM: nSVM: Correctly set the shadow NPT root level in its MMU role
    KVM: Using macros instead of magic values
    MIPS: KVM: Fix build error caused by 'kvm_run' cleanup
    KVM: nSVM: remove nonsensical EXITINFO1 adjustment on nested NPF
    KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support
    KVM: VMX: optimize #PF injection when MAXPHYADDR does not match
    KVM: VMX: Add guest physical address check in EPT violation and misconfig
    KVM: VMX: introduce vmx_need_pf_intercept
    KVM: x86: update exception bitmap on CPUID changes
    KVM: x86: rename update_bp_intercept to update_exception_bitmap
    ...

    Linus Torvalds
     

06 Aug, 2020

1 commit

  • Move POSIX CPU timer expiry and signal delivery into task context.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Reviewed-by: Oleg Nesterov
    Acked-by: Peter Zijlstra (Intel)
    Link: https://lore.kernel.org/r/20200730102337.888613724@linutronix.de

    Thomas Gleixner
     

05 Aug, 2020

3 commits

  • Pull x86 conversion to generic entry code from Thomas Gleixner:
    "The conversion of X86 syscall, interrupt and exception entry/exit
    handling to the generic code.

    Pretty much a straight-forward 1:1 conversion plus the consolidation
    of the KVM handling of pending work before entering guest mode"

    * tag 'x86-entry-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/kvm: Use __xfer_to_guest_mode_work_pending() in kvm_run_vcpu()
    x86/kvm: Use generic xfer to guest work function
    x86/entry: Cleanup idtentry_enter/exit
    x86/entry: Use generic interrupt entry/exit code
    x86/entry: Cleanup idtentry_entry/exit_user
    x86/entry: Use generic syscall exit functionality
    x86/entry: Use generic syscall entry function
    x86/ptrace: Provide pt_regs helper for entry/exit
    x86/entry: Move user return notifier out of loop
    x86/entry: Consolidate 32/64 bit syscall entry
    x86/entry: Consolidate check_user_regs()
    x86: Correct noinstr qualifiers
    x86/idtentry: Remove stale comment

    Linus Torvalds
     
  • Pull dma-mapping updates from Christoph Hellwig:

    - make support for dma_ops optional

    - move more code out of line

    - add generic support for a dma_ops bypass mode

    - misc cleanups

    * tag 'dma-mapping-5.9' of git://git.infradead.org/users/hch/dma-mapping:
    dma-contiguous: cleanup dma_alloc_contiguous
    dma-debug: use named initializers for dir2name
    powerpc: use the generic dma_ops_bypass mode
    dma-mapping: add a dma_ops_bypass flag to struct device
    dma-mapping: make support for dma ops optional
    dma-mapping: inline the fast path dma-direct calls
    dma-mapping: move the remaining DMA API calls out of line

    Linus Torvalds
     
  • Pull fork cleanups from Christian Brauner:
    "This is cleanup series from when we reworked a chunk of the process
    creation paths in the kernel and switched to struct
    {kernel_}clone_args.

    High-level this does two main things:

    - Remove the double export of both do_fork() and _do_fork() where
    do_fork() used the incosistent legacy clone calling convention.

    Now we only export _do_fork() which is based on struct
    kernel_clone_args.

    - Remove the copy_thread_tls()/copy_thread() split making the
    architecture specific HAVE_COYP_THREAD_TLS config option obsolete.

    This switches all remaining architectures to select
    HAVE_COPY_THREAD_TLS and thus to the copy_thread_tls() calling
    convention. The current split makes the process creation codepaths
    more convoluted than they need to be. Each architecture has their own
    copy_thread() function unless it selects HAVE_COPY_THREAD_TLS then it
    has a copy_thread_tls() function.

    The split is not needed anymore nowadays, all architectures support
    CLONE_SETTLS but quite a few of them never bothered to select
    HAVE_COPY_THREAD_TLS and instead simply continued to use copy_thread()
    and use the old calling convention. Removing this split cleans up the
    process creation codepaths and paves the way for implementing clone3()
    on such architectures since it requires the copy_thread_tls() calling
    convention.

    After having made each architectures support copy_thread_tls() this
    series simply renames that function back to copy_thread(). It also
    switches all architectures that call do_fork() directly over to
    _do_fork() and the struct kernel_clone_args calling convention. This
    is a corollary of switching the architectures that did not yet support
    it over to copy_thread_tls() since do_fork() is conditional on not
    supporting copy_thread_tls() (Mostly because it lacks a separate
    argument for tls which is trivial to fix but there's no need for this
    function to exist.).

    The do_fork() removal is in itself already useful as it allows to to
    remove the export of both do_fork() and _do_fork() we currently have
    in favor of only _do_fork(). This has already been discussed back when
    we added clone3(). The legacy clone() calling convention is - as is
    probably well-known - somewhat odd:

    #
    # ABI hall of shame
    #
    config CLONE_BACKWARDS
    config CLONE_BACKWARDS2
    config CLONE_BACKWARDS3

    that is aggravated by the fact that some architectures such as sparc
    follow the CLONE_BACKWARDSx calling convention but don't really select
    the corresponding config option since they call do_fork() directly.

    So do_fork() enforces a somewhat arbitrary calling convention in the
    first place that doesn't really help the individual architectures that
    deviate from it. They can thus simply be switched to _do_fork()
    enforcing a single calling convention. (I really hope that any new
    architectures will __not__ try to implement their own calling
    conventions...)

    Most architectures already have made a similar switch (m68k comes to
    mind).

    Overall this removes more code than it adds even with a good portion
    of added comments. It simplifies a chunk of arch specific assembly
    either by moving the code into C or by simply rewriting the assembly.

    Architectures that have been touched in non-trivial ways have all been
    actually boot and stress tested: sparc and ia64 have been tested with
    Debian 9 images. They are the two architectures which have been
    touched the most. All non-trivial changes to architectures have seen
    acks from the relevant maintainers. nios2 with a custom built
    buildroot image. h8300 I couldn't get something bootable to test on
    but the changes have been fairly automatic and I'm sure we'll hear
    people yell if I broke something there.

    All other architectures that have been touched in trivial ways have
    been compile tested for each single patch of the series via git rebase
    -x "make ..." v5.8-rc2. arm{64} and x86{_64} have been boot tested
    even though they have just been trivially touched (removal of the
    HAVE_COPY_THREAD_TLS macro from their Kconfig) because well they are
    basically "core architectures" and since it is trivial to get your
    hands on a useable image"

    * tag 'fork-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
    arch: rename copy_thread_tls() back to copy_thread()
    arch: remove HAVE_COPY_THREAD_TLS
    unicore: switch to copy_thread_tls()
    sh: switch to copy_thread_tls()
    nds32: switch to copy_thread_tls()
    microblaze: switch to copy_thread_tls()
    hexagon: switch to copy_thread_tls()
    c6x: switch to copy_thread_tls()
    alpha: switch to copy_thread_tls()
    fork: remove do_fork()
    h8300: select HAVE_COPY_THREAD_TLS, switch to kernel_clone_args
    nios2: enable HAVE_COPY_THREAD_TLS, switch to kernel_clone_args
    ia64: enable HAVE_COPY_THREAD_TLS, switch to kernel_clone_args
    sparc: unconditionally enable HAVE_COPY_THREAD_TLS
    sparc: share process creation helpers between sparc and sparc64
    sparc64: enable HAVE_COPY_THREAD_TLS
    fork: fold legacy_clone_args_valid() into _do_fork()

    Linus Torvalds
     

04 Aug, 2020

1 commit


31 Jul, 2020

1 commit

  • - Add support for zstd compressed kernel

    - Define __DISABLE_EXPORTS in Makefile

    - Remove __DISABLE_EXPORTS definition from kaslr.c

    - Bump the heap size for zstd.

    - Update the documentation.

    Integrates the ZSTD decompression code to the x86 pre-boot code.

    Zstandard requires slightly more memory during the kernel decompression
    on x86 (192 KB vs 64 KB), and the memory usage is independent of the
    window size.

    __DISABLE_EXPORTS is now defined in the Makefile, which covers both
    the existing use in kaslr.c, and the use needed by the zstd decompressor
    in misc.c.

    This patch has been boot tested with both a zstd and gzip compressed
    kernel on i386 and x86_64 using buildroot and QEMU.

    Additionally, this has been tested in production on x86_64 devices.
    We saw a 2 second boot time reduction by switching kernel compression
    from xz to zstd.

    Signed-off-by: Nick Terrell
    Signed-off-by: Ingo Molnar
    Tested-by: Sedat Dilek
    Reviewed-by: Kees Cook
    Link: https://lore.kernel.org/r/20200730190841.2071656-7-nickrterrell@gmail.com

    Nick Terrell
     

24 Jul, 2020

1 commit

  • Replace the syscall entry work handling with the generic version. Provide
    the necessary helper inlines to handle the real architecture specific
    parts, e.g. ptrace.

    Use a temporary define for idtentry_enter_user which will be cleaned up
    seperately.

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kees Cook
    Link: https://lkml.kernel.org/r/20200722220520.376213694@linutronix.de

    Thomas Gleixner
     

19 Jul, 2020

1 commit


09 Jul, 2020

1 commit


05 Jul, 2020

1 commit

  • All architectures support copy_thread_tls() now, so remove the legacy
    copy_thread() function and the HAVE_COPY_THREAD_TLS config option. Everyone
    uses the same process creation calling convention based on
    copy_thread_tls() and struct kernel_clone_args. This will make it easier to
    maintain the core process creation code under kernel/, simplifies the
    callpaths and makes the identical for all architectures.

    Cc: linux-arch@vger.kernel.org
    Acked-by: Thomas Bogendoerfer
    Acked-by: Greentime Hu
    Acked-by: Geert Uytterhoeven
    Reviewed-by: Kees Cook
    Signed-off-by: Christian Brauner

    Christian Brauner
     

18 Jun, 2020

1 commit

  • Since many compilers cannot disable KCOV with a function attribute,
    help it to NOP out any __sanitizer_cov_*() calls injected in noinstr
    code.

    This turns:

    12: e8 00 00 00 00 callq 17
    13: R_X86_64_PLT32 __sanitizer_cov_trace_pc-0x4

    into:

    12: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
    13: R_X86_64_NONE __sanitizer_cov_trace_pc-0x4

    Just like recordmcount does.

    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Dmitry Vyukov

    Peter Zijlstra
     

15 Jun, 2020

2 commits

  • KVM now supports using interrupt for 'page ready' APF event delivery and
    legacy mechanism was deprecated. Switch KVM guests to the new one.

    Signed-off-by: Vitaly Kuznetsov
    Message-Id:
    [Use HYPERVISOR_CALLBACK_VECTOR instead of a separate vector. - Paolo]
    Signed-off-by: Paolo Bonzini

    Vitaly Kuznetsov
     
  • The x86 microcode support works just fine without FW_LOADER. In fact,
    these days most people load microcode early during boot so FW_LOADER
    never gets into the picture anyway.

    As almost everyone on x86 needs to enable MICROCODE, this by extension
    means that FW_LOADER is always built into the kernel even if nothing
    uses it. The FW_LOADER system is about two thousand lines long and
    contains user-space facing interfaces that could potentially provide an
    entry point into the kernel (or beyond).

    Remove the unnecessary select of FW_LOADER by MICROCODE. People who need
    the FW_LOADER capability can still enable it.

    [ bp: Massage a bit. ]

    Signed-off-by: Herbert Xu
    Signed-off-by: Borislav Petkov
    Link: https://lkml.kernel.org/r/20200610042911.GA20058@gondor.apana.org.au

    Herbert Xu
     

14 Jun, 2020

3 commits

  • Pull more Kbuild updates from Masahiro Yamada:

    - fix build rules in binderfs sample

    - fix build errors when Kbuild recurses to the top Makefile

    - covert '---help---' in Kconfig to 'help'

    * tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
    treewide: replace '---help---' in Kconfig files with 'help'
    kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables
    samples: binderfs: really compile this sample and fix build issues

    Linus Torvalds
     
  • Pull x86 entry updates from Thomas Gleixner:
    "The x86 entry, exception and interrupt code rework

    This all started about 6 month ago with the attempt to move the Posix
    CPU timer heavy lifting out of the timer interrupt code and just have
    lockless quick checks in that code path. Trivial 5 patches.

    This unearthed an inconsistency in the KVM handling of task work and
    the review requested to move all of this into generic code so other
    architectures can share.

    Valid request and solved with another 25 patches but those unearthed
    inconsistencies vs. RCU and instrumentation.

    Digging into this made it obvious that there are quite some
    inconsistencies vs. instrumentation in general. The int3 text poke
    handling in particular was completely unprotected and with the batched
    update of trace events even more likely to expose to endless int3
    recursion.

    In parallel the RCU implications of instrumenting fragile entry code
    came up in several discussions.

    The conclusion of the x86 maintainer team was to go all the way and
    make the protection against any form of instrumentation of fragile and
    dangerous code pathes enforcable and verifiable by tooling.

    A first batch of preparatory work hit mainline with commit
    d5f744f9a2ac ("Pull x86 entry code updates from Thomas Gleixner")

    That (almost) full solution introduced a new code section
    '.noinstr.text' into which all code which needs to be protected from
    instrumentation of all sorts goes into. Any call into instrumentable
    code out of this section has to be annotated. objtool has support to
    validate this.

    Kprobes now excludes this section fully which also prevents BPF from
    fiddling with it and all 'noinstr' annotated functions also keep
    ftrace off. The section, kprobes and objtool changes are already
    merged.

    The major changes coming with this are:

    - Preparatory cleanups

    - Annotating of relevant functions to move them into the
    noinstr.text section or enforcing inlining by marking them
    __always_inline so the compiler cannot misplace or instrument
    them.

    - Splitting and simplifying the idtentry macro maze so that it is
    now clearly separated into simple exception entries and the more
    interesting ones which use interrupt stacks and have the paranoid
    handling vs. CR3 and GS.

    - Move quite some of the low level ASM functionality into C code:

    - enter_from and exit to user space handling. The ASM code now
    calls into C after doing the really necessary ASM handling and
    the return path goes back out without bells and whistels in
    ASM.

    - exception entry/exit got the equivivalent treatment

    - move all IRQ tracepoints from ASM to C so they can be placed as
    appropriate which is especially important for the int3
    recursion issue.

    - Consolidate the declaration and definition of entry points between
    32 and 64 bit. They share a common header and macros now.

    - Remove the extra device interrupt entry maze and just use the
    regular exception entry code.

    - All ASM entry points except NMI are now generated from the shared
    header file and the corresponding macros in the 32 and 64 bit
    entry ASM.

    - The C code entry points are consolidated as well with the help of
    DEFINE_IDTENTRY*() macros. This allows to ensure at one central
    point that all corresponding entry points share the same
    semantics. The actual function body for most entry points is in an
    instrumentable and sane state.

    There are special macros for the more sensitive entry points, e.g.
    INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF.
    They allow to put the whole entry instrumentation and RCU handling
    into safe places instead of the previous pray that it is correct
    approach.

    - The INT3 text poke handling is now completely isolated and the
    recursion issue banned. Aside of the entry rework this required
    other isolation work, e.g. the ability to force inline bsearch.

    - Prevent #DB on fragile entry code, entry relevant memory and
    disable it on NMI, #MC entry, which allowed to get rid of the
    nested #DB IST stack shifting hackery.

    - A few other cleanups and enhancements which have been made
    possible through this and already merged changes, e.g.
    consolidating and further restricting the IDT code so the IDT
    table becomes RO after init which removes yet another popular
    attack vector

    - About 680 lines of ASM maze are gone.

    There are a few open issues:

    - An escape out of the noinstr section in the MCE handler which needs
    some more thought but under the aspect that MCE is a complete
    trainwreck by design and the propability to survive it is low, this
    was not high on the priority list.

    - Paravirtualization

    When PV is enabled then objtool complains about a bunch of indirect
    calls out of the noinstr section. There are a few straight forward
    ways to fix this, but the other issues vs. general correctness were
    more pressing than parawitz.

    - KVM

    KVM is inconsistent as well. Patches have been posted, but they
    have not yet been commented on or picked up by the KVM folks.

    - IDLE

    Pretty much the same problems can be found in the low level idle
    code especially the parts where RCU stopped watching. This was
    beyond the scope of the more obvious and exposable problems and is
    on the todo list.

    The lesson learned from this brain melting exercise to morph the
    evolved code base into something which can be validated and understood
    is that once again the violation of the most important engineering
    principle "correctness first" has caused quite a few people to spend
    valuable time on problems which could have been avoided in the first
    place. The "features first" tinkering mindset really has to stop.

    With that I want to say thanks to everyone involved in contributing to
    this effort. Special thanks go to the following people (alphabetical
    order): Alexandre Chartre, Andy Lutomirski, Borislav Petkov, Brian
    Gerst, Frederic Weisbecker, Josh Poimboeuf, Juergen Gross, Lai
    Jiangshan, Macro Elver, Paolo Bonzin,i Paul McKenney, Peter Zijlstra,
    Vitaly Kuznetsov, and Will Deacon"

    * tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (142 commits)
    x86/entry: Force rcu_irq_enter() when in idle task
    x86/entry: Make NMI use IDTENTRY_RAW
    x86/entry: Treat BUG/WARN as NMI-like entries
    x86/entry: Unbreak __irqentry_text_start/end magic
    x86/entry: __always_inline CR2 for noinstr
    lockdep: __always_inline more for noinstr
    x86/entry: Re-order #DB handler to avoid *SAN instrumentation
    x86/entry: __always_inline arch_atomic_* for noinstr
    x86/entry: __always_inline irqflags for noinstr
    x86/entry: __always_inline debugreg for noinstr
    x86/idt: Consolidate idt functionality
    x86/idt: Cleanup trap_init()
    x86/idt: Use proper constants for table size
    x86/idt: Add comments about early #PF handling
    x86/idt: Mark init only functions __init
    x86/entry: Rename trace_hardirqs_off_prepare()
    x86/entry: Clarify irq_{enter,exit}_rcu()
    x86/entry: Remove DBn stacks
    x86/entry: Remove debug IDT frobbing
    x86/entry: Optimize local_db_save() for virt
    ...

    Linus Torvalds
     
  • Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over
    '---help---'"), the number of '---help---' has been gradually
    decreasing, but there are still more than 2400 instances.

    This commit finishes the conversion. While I touched the lines,
    I also fixed the indentation.

    There are a variety of indentation styles found.

    a) 4 spaces + '---help---'
    b) 7 spaces + '---help---'
    c) 8 spaces + '---help---'
    d) 1 space + 1 tab + '---help---'
    e) 1 tab + '---help---' (correct indentation)
    f) 1 tab + 1 space + '---help---'
    g) 1 tab + 2 spaces + '---help---'

    In order to convert all of them to 1 tab + 'help', I ran the
    following commend:

    $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'

    Signed-off-by: Masahiro Yamada

    Masahiro Yamada
     

13 Jun, 2020

1 commit

  • Pull more KVM updates from Paolo Bonzini:
    "The guest side of the asynchronous page fault work has been delayed to
    5.9 in order to sync with Thomas's interrupt entry rework, but here's
    the rest of the KVM updates for this merge window.

    MIPS:
    - Loongson port

    PPC:
    - Fixes

    ARM:
    - Fixes

    x86:
    - KVM_SET_USER_MEMORY_REGION optimizations
    - Fixes
    - Selftest fixes"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (62 commits)
    KVM: x86: do not pass poisoned hva to __kvm_set_memory_region
    KVM: selftests: fix sync_with_host() in smm_test
    KVM: async_pf: Inject 'page ready' event only if 'page not present' was previously injected
    KVM: async_pf: Cleanup kvm_setup_async_pf()
    kvm: i8254: remove redundant assignment to pointer s
    KVM: x86: respect singlestep when emulating instruction
    KVM: selftests: Don't probe KVM_CAP_HYPERV_ENLIGHTENED_VMCS when nested VMX is unsupported
    KVM: selftests: do not substitute SVM/VMX check with KVM_CAP_NESTED_STATE check
    KVM: nVMX: Consult only the "basic" exit reason when routing nested exit
    KVM: arm64: Move hyp_symbol_addr() to kvm_asm.h
    KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception
    KVM: arm64: Make vcpu_cp1x() work on Big Endian hosts
    KVM: arm64: Remove host_cpu_context member from vcpu structure
    KVM: arm64: Stop sparse from moaning at __hyp_this_cpu_ptr
    KVM: arm64: Handle PtrAuth traps early
    KVM: x86: Unexport x86_fpu_cache and make it static
    KVM: selftests: Ignore KVM 5-level paging support for VM_MODE_PXXV48_4K
    KVM: arm64: Save the host's PtrAuth keys in non-preemptible context
    KVM: arm64: Stop save/restoring ACTLR_EL1
    KVM: arm64: Add emulation for 32bit guests accessing ACTLR2
    ...

    Linus Torvalds
     

12 Jun, 2020

1 commit

  • Merge the state of the locking kcsan branch before the read/write_once()
    and the atomics modifications got merged.

    Squash the fallout of the rebase on top of the read/write once and atomic
    fallback work into the merge. The history of the original branch is
    preserved in tag locking-kcsan-2020-06-02.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

11 Jun, 2020

1 commit

  • Replace the extra interrupt handling code and reuse the existing idtentry
    machinery. This moves the irq stack switching on 64-bit from ASM to C code;
    32-bit already does the stack switching in C.

    This requires to remove HAVE_IRQ_EXIT_ON_IRQ_STACK as the stack switch is
    not longer in the low level entry code.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar
    Acked-by: Andy Lutomirski
    Link: https://lore.kernel.org/r/20200521202119.078690991@linutronix.de

    Thomas Gleixner
     

07 Jun, 2020

1 commit

  • Pull dma-mapping updates from Christoph Hellwig:

    - enhance the dma pool to allow atomic allocation on x86 with AMD SEV
    (David Rientjes)

    - two small cleanups (Jason Yan and Peter Collingbourne)

    * tag 'dma-mapping-5.8' of git://git.infradead.org/users/hch/dma-mapping:
    dma-contiguous: fix comment for dma_release_from_contiguous
    dma-pool: scale the default DMA coherent pool size with memory capacity
    x86/mm: unencrypted non-blocking DMA allocations use coherent pools
    dma-pool: add pool sizes to debugfs
    dma-direct: atomic allocations must come from atomic coherent pools
    dma-pool: dynamically expanding atomic pools
    dma-pool: add additional coherent pools to map to gfp mask
    dma-remap: separate DMA atomic pools from direct remap code
    dma-debug: make __dma_entry_alloc_check_leak() static

    Linus Torvalds
     

05 Jun, 2020

2 commits

  • This adds tests which will validate architecture page table helpers and
    other accessors in their compliance with expected generic MM semantics.
    This will help various architectures in validating changes to existing
    page table helpers or addition of new ones.

    This test covers basic page table entry transformations including but not
    limited to old, young, dirty, clean, write, write protect etc at various
    level along with populating intermediate entries with next page table page
    and validating them.

    Test page table pages are allocated from system memory with required size
    and alignments. The mapped pfns at page table levels are derived from a
    real pfn representing a valid kernel text symbol. This test gets called
    via late_initcall().

    This test gets built and run when CONFIG_DEBUG_VM_PGTABLE is selected.
    Any architecture, which is willing to subscribe this test will need to
    select ARCH_HAS_DEBUG_VM_PGTABLE. For now this is limited to arc, arm64,
    x86, s390 and powerpc platforms where the test is known to build and run
    successfully Going forward, other architectures too can subscribe the test
    after fixing any build or runtime problems with their page table helpers.

    Folks interested in making sure that a given platform's page table helpers
    conform to expected generic MM semantics should enable the above config
    which will just trigger this test during boot. Any non conformity here
    will be reported as an warning which would need to be fixed. This test
    will help catch any changes to the agreed upon semantics expected from
    generic MM and enable platforms to accommodate it thereafter.

    [anshuman.khandual@arm.com: v17]
    Link: http://lkml.kernel.org/r/1587436495-22033-3-git-send-email-anshuman.khandual@arm.com
    [anshuman.khandual@arm.com: v18]
    Link: http://lkml.kernel.org/r/1588564865-31160-3-git-send-email-anshuman.khandual@arm.com
    Suggested-by: Catalin Marinas
    Signed-off-by: Anshuman Khandual
    Signed-off-by: Christophe Leroy
    Signed-off-by: Qian Cai
    Signed-off-by: Andrew Morton
    Tested-by: Gerald Schaefer [s390]
    Tested-by: Christophe Leroy [ppc32]
    Reviewed-by: Ingo Molnar
    Cc: Mike Rapoport
    Cc: Vineet Gupta
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Heiko Carstens
    Cc: Vasily Gorbik
    Cc: Christian Borntraeger
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Borislav Petkov
    Cc: "H. Peter Anvin"
    Cc: Kirill A. Shutemov
    Cc: Paul Walmsley
    Cc: Palmer Dabbelt
    Link: http://lkml.kernel.org/r/1583919272-24178-1-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     
  • Remove KVM_DEBUG_FS, which can easily be misconstrued as controlling
    KVM-as-a-host. The sole user of CONFIG_KVM_DEBUG_FS was removed by
    commit cfd8983f03c7b ("x86, locking/spinlocks: Remove ticket (spin)lock
    implementation").

    Signed-off-by: Sean Christopherson
    Message-Id:
    Signed-off-by: Paolo Bonzini

    Sean Christopherson
     

04 Jun, 2020

3 commits

  • Merge more updates from Andrew Morton:
    "More mm/ work, plenty more to come

    Subsystems affected by this patch series: slub, memcg, gup, kasan,
    pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs,
    thp, mmap, kconfig"

    * akpm: (131 commits)
    arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
    x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
    riscv: support DEBUG_WX
    mm: add DEBUG_WX support
    drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
    mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
    powerpc/mm: drop platform defined pmd_mknotpresent()
    mm: thp: don't need to drain lru cache when splitting and mlocking THP
    hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
    sparc32: register memory occupied by kernel as memblock.memory
    include/linux/memblock.h: fix minor typo and unclear comment
    mm, mempolicy: fix up gup usage in lookup_node
    tools/vm/page_owner_sort.c: filter out unneeded line
    mm: swap: memcg: fix memcg stats for huge pages
    mm: swap: fix vmstats for huge pages
    mm: vmscan: limit the range of LRU type balancing
    mm: vmscan: reclaim writepage is IO cost
    mm: vmscan: determine anon/file pressure balance at the reclaim root
    mm: balance LRU lists based on relative thrashing
    mm: only count actual rotations as LRU reclaim cost
    ...

    Linus Torvalds
     
  • Extract DEBUG_WX to mm/Kconfig.debug for shared use. Change to use
    ARCH_HAS_DEBUG_WX instead of DEBUG_WX defined by arch port.

    Signed-off-by: Zong Li
    Signed-off-by: Andrew Morton
    Cc: Borislav Petkov
    Cc: Catalin Marinas
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Palmer Dabbelt
    Cc: Paul Walmsley
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/430736828d149df3f5b462d291e845ec690e0141.1587455584.git.zong.li@sifive.com
    Signed-off-by: Linus Torvalds

    Zong Li
     
  • The memmap_init() function was made to iterate over memblock regions and
    as the result the early_pfn_in_nid() function became obsolete. Since
    CONFIG_NODES_SPAN_OTHER_NODES is only used to pick a stub or a real
    implementation of early_pfn_in_nid(), it is also not needed anymore.

    Remove both early_pfn_in_nid() and the CONFIG_NODES_SPAN_OTHER_NODES.

    Co-developed-by: Hoan Tran
    Signed-off-by: Hoan Tran
    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Tested-by: Hoan Tran [arm64]
    Cc: Baoquan He
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: "James E.J. Bottomley"
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200412194859.12663-17-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport