16 Jul, 2017

3 commits

  • Pull MIPS updates from Ralf Baechle:
    "Boston platform support:
    - Document DT bindings
    - Add CLK driver for board clocks

    CM:
    - Avoid per-core locking with CM3 & higher
    - WARN on attempt to lock invalid VP, not BUG

    CPS:
    - Select CONFIG_SYS_SUPPORTS_SCHED_SMT for MIPSr6
    - Prevent multi-core with dcache aliasing
    - Handle cores not powering down more gracefully
    - Handle spurious VP starts more gracefully

    DSP:
    - Add lwx & lhx missaligned access support

    eBPF:
    - Add MIPS support along with many supporting change to add the
    required infrastructure

    Generic arch code:
    - Misc sysmips MIPS_ATOMIC_SET fixes
    - Drop duplicate HAVE_SYSCALL_TRACEPOINTS
    - Negate error syscall return in trace
    - Correct forced syscall errors
    - Traced negative syscalls should return -ENOSYS
    - Allow samples/bpf/tracex5 to access syscall arguments for sane
    traces
    - Cleanup from old Kconfig options in defconfigs
    - Fix PREF instruction usage by memcpy for MIPS R6
    - Fix various special cases in the FPU eulation
    - Fix some special cases in MIPS16e2 support
    - Fix MIPS I ISA /proc/cpuinfo reporting
    - Sort MIPS Kconfig alphabetically
    - Fix minimum alignment requirement of IRQ stack as required by
    ABI / GCC
    - Fix special cases in the module loader
    - Perform post-DMA cache flushes on systems with MAARs
    - Probe the I6500 CPU
    - Cleanup cmpxchg and add support for 1 and 2 byte operations
    - Use queued read/write locks (qrwlock)
    - Use queued spinlocks (qspinlock)
    - Add CPU shared FTLB feature detection
    - Handle tlbex-tlbp race condition
    - Allow storing pgd in C0_CONTEXT for MIPSr6
    - Use current_cpu_type() in m4kc_tlbp_war()
    - Support Boston in the generic kernel

    Generic platform:
    - yamon-dt: Pull YAMON DT shim code out of SEAD-3 board
    - yamon-dt: Support > 256MB of RAM
    - yamon-dt: Use serial* rather than uart* aliases
    - Abstract FDT fixup application
    - Set RTC_ALWAYS_BCD to 0
    - Add a MAINTAINERS entry

    core kernel:
    - qspinlock.c: include linux/prefetch.h

    Loongson 3:
    - Add support

    Perf:
    - Add I6500 support

    SEAD-3:
    - Remove GIC timer from DT
    - Set interrupt-parent per-device, not at root node
    - Fix GIC interrupt specifiers

    SMP:
    - Skip IPI setup if we only have a single CPU

    VDSO:
    - Make comment match reality
    - Improvements to time code in VDSO"

    * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (86 commits)
    locking/qspinlock: Include linux/prefetch.h
    MIPS: Fix MIPS I ISA /proc/cpuinfo reporting
    MIPS: Fix minimum alignment requirement of IRQ stack
    MIPS: generic: Support MIPS Boston development boards
    MIPS: DTS: img: Don't attempt to build-in all .dtb files
    clk: boston: Add a driver for MIPS Boston board clocks
    dt-bindings: Document img,boston-clock binding
    MIPS: Traced negative syscalls should return -ENOSYS
    MIPS: Correct forced syscall errors
    MIPS: Negate error syscall return in trace
    MIPS: Drop duplicate HAVE_SYSCALL_TRACEPOINTS select
    MIPS16e2: Provide feature overrides for non-MIPS16 systems
    MIPS: MIPS16e2: Report ASE presence in /proc/cpuinfo
    MIPS: MIPS16e2: Subdecode extended LWSP/SWSP instructions
    MIPS: MIPS16e2: Identify ASE presence
    MIPS: VDSO: Fix a mismatch between comment and preprocessor constant
    MIPS: VDSO: Add implementation of gettimeofday() fallback
    MIPS: VDSO: Add implementation of clock_gettime() fallback
    MIPS: VDSO: Fix conversions in do_monotonic()/do_monotonic_coarse()
    MIPS: Use current_cpu_type() in m4kc_tlbp_war()
    ...

    Linus Torvalds
     
  • Pull UML updates from Richard Weinberger:
    "Mostly fixes for UML:

    - First round of fixes for PTRACE_GETRESET/SETREGSET

    - A printf vs printk cleanup

    - Minor improvements"

    * 'for-linus-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
    um: Correctly check for PTRACE_GETRESET/SETREGSET
    um: v2: Use generic NOTES macro
    um: Add kerneldoc for userspace_tramp() and start_userspace()
    um: Add kerneldoc for segv_handler
    um: stub-data.h: remove superfluous include
    um: userspace - be more verbose in ptrace set regs error
    um: add dummy ioremap and iounmap functions
    um: Allow building and running on older hosts
    um: Avoid longjmp/setjmp symbol clashes with libpthread.a
    um: console: Ignore console= option
    um: Use os_warn to print out pre-boot warning/error messages
    um: Add os_warn() for pre-boot warning/error messages
    um: Use os_info for the messages on normal path
    um: Add os_info() for pre-boot information messages
    um: Use printk instead of printf in make_uml_dir

    Linus Torvalds
     
  • Pull more KVM updates from Radim Krčmář:
    "Second batch of KVM updates for v4.13

    Common:
    - add uevents for VM creation/destruction
    - annotate and properly access RCU-protected objects

    s390:
    - rename IOCTL added in the first v4.13 merge

    x86:
    - emulate VMLOAD VMSAVE feature in SVM
    - support paravirtual asynchronous page fault while nested
    - add Hyper-V userspace interfaces for better migration
    - improve master clock corner cases
    - extend internal error reporting after EPT misconfig
    - correct single-stepping of emulated instructions in SVM
    - handle MCE during VM entry
    - fix nVMX VM entry checks and nVMX VMCS shadowing"

    * tag 'kvm-4.13-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
    kvm: x86: hyperv: make VP_INDEX managed by userspace
    KVM: async_pf: Let guest support delivery of async_pf from guest mode
    KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
    KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
    KVM: x86: Simplify kvm_x86_ops->queue_exception parameter list
    kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2
    KVM: x86: make backwards_tsc_observed a per-VM variable
    KVM: trigger uevents when creating or destroying a VM
    KVM: SVM: Enable Virtual VMLOAD VMSAVE feature
    KVM: SVM: Add Virtual VMLOAD VMSAVE feature definition
    KVM: SVM: Rename lbr_ctl field in the vmcb control area
    KVM: SVM: Prepare for new bit definition in lbr_ctl
    KVM: SVM: handle singlestep exception when skipping emulated instructions
    KVM: x86: take slots_lock in kvm_free_pit
    KVM: s390: Fix KVM_S390_GET_CMMA_BITS ioctl definition
    kvm: vmx: Properly handle machine check during VM-entry
    KVM: x86: update master clock before computing kvmclock_offset
    kvm: nVMX: Shadow "high" parts of shadowed 64-bit VMCS fields
    kvm: nVMX: Fix nested_vmx_check_msr_bitmap_controls
    kvm: nVMX: Validate the I/O bitmaps on nested VM-entry
    ...

    Linus Torvalds
     

15 Jul, 2017

6 commits

  • Pull crypto fixes from Herbert Xu:

    - fix new compiler warnings in cavium

    - set post-op IV properly in caam (this fixes chaining)

    - fix potential use-after-free in atmel in case of EBUSY

    - fix sleeping in softirq path in chcr

    - disable buggy sha1-avx2 driver (may overread and page fault)

    - fix use-after-free on signals in caam

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: cavium - make several functions static
    crypto: chcr - Avoid algo allocation in softirq.
    crypto: caam - properly set IV after {en,de}crypt
    crypto: atmel - only treat EBUSY as transient if backlog
    crypto: af_alg - Avoid sock_graft call warning
    crypto: caam - fix signals handling
    crypto: sha1-ssse3 - Disable avx2

    Linus Torvalds
     
  • Merge even more updates from Andrew Morton:

    - a few leftovers

    - fault-injector rework

    - add a module loader test driver

    * emailed patches from Andrew Morton :
    kmod: throttle kmod thread limit
    kmod: add test driver to stress test the module loader
    MAINTAINERS: give kmod some maintainer love
    xtensa: use generic fb.h
    fault-inject: add /proc//fail-nth
    fault-inject: simplify access check for fail-nth
    fault-inject: make fail-nth read/write interface symmetric
    fault-inject: parse as natural 1-based value for fail-nth write interface
    fault-inject: automatically detect the number base for fail-nth write interface
    kernel/watchdog.c: use better pr_fmt prefix
    MAINTAINERS: move the befs tree to kernel.org
    lib/atomic64_test.c: add a test that atomic64_inc_not_zero() returns an int
    mm: fix overflow check in expand_upwards()

    Linus Torvalds
     
  • Pull arch/tile updates from Chris Metcalf:
    "This adds support for an to help with removing
    __need_xxx #defines from glibc, and removes some dead code in
    arch/tile/mm/init.c"

    * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
    mm, tile: drop arch_{add,remove}_memory
    tile: prefer to __need_int_reg_t

    Linus Torvalds
     
  • Pull powerpc fixes from Michael Ellerman:
    "Nothing that really stands out, just a bunch of fixes that have come
    in in the last couple of weeks.

    None of these are actually fixes for code that is new in 4.13. It's
    roughly half older bugs, with fixes going to stable, and half
    fixes/updates for Power9.

    Thanks to: Aneesh Kumar K.V, Anton Blanchard, Balbir Singh, Benjamin
    Herrenschmidt, Madhavan Srinivasan, Michael Neuling, Nicholas Piggin,
    Oliver O'Halloran"

    * tag 'powerpc-4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
    powerpc/64: Fix atomic64_inc_not_zero() to return an int
    powerpc: Fix emulation of mfocrf in emulate_step()
    powerpc: Fix emulation of mcrf in emulate_step()
    powerpc/perf: Add POWER9 alternate PM_RUN_CYC and PM_RUN_INST_CMPL events
    powerpc/perf: Fix SDAR_MODE value for continous sampling on Power9
    powerpc/asm: Mark cr0 as clobbered in mftb()
    powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9
    powerpc/mm/radix: Synchronize updates to the process table
    powerpc/mm/radix: Properly clear process table entry
    powerpc/powernv: Tell OPAL about our MMU mode on POWER9
    powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR

    Linus Torvalds
     
  • The arch uses a verbatim copy of the asm-generic version and does not
    add any own implementations to the header, so use asm-generic/fb.h
    instead of duplicating code.

    Link: http://lkml.kernel.org/r/20170517083545.2115-1-tklauser@distanz.ch
    Signed-off-by: Tobias Klauser
    Acked-by: Max Filippov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tobias Klauser
     
  • Pull PCI fixes from Bjorn Helgaas:

    - fix a typo that broke Rockchip enumeration

    - fix a new memory leak in the ARM host bridge failure path

    * tag 'pci-v4.13-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
    PCI: rockchip: Check for pci_scan_root_bus_bridge() failure correctly
    ARM/PCI: Fix pcibios_init_resource() struct pci_host_bridge leak

    Linus Torvalds
     

14 Jul, 2017

6 commits

  • Hyper-V identifies vCPUs by Virtual Processor Index, which can be
    queried via HV_X64_MSR_VP_INDEX msr. It is defined by the spec as a
    sequential number which can't exceed the maximum number of vCPUs per VM.
    APIC ids can be sparse and thus aren't a valid replacement for VP
    indices.

    Current KVM uses its internal vcpu index as VP_INDEX. However, to make
    it predictable and persistent across VM migrations, the userspace has to
    control the value of VP_INDEX.

    This patch achieves that, by storing vp_index explicitly on vcpu, and
    allowing HV_X64_MSR_VP_INDEX to be set from the host side. For
    compatibility it's initialized to KVM vcpu index. Also a few variables
    are renamed to make clear distinction betweed this Hyper-V vp_index and
    KVM vcpu_id (== APIC id). Besides, a new capability,
    KVM_CAP_HYPERV_VP_INDEX, is added to allow the userspace to skip
    attempting msr writes where unsupported, to avoid spamming error logs.

    Signed-off-by: Roman Kagan
    Signed-off-by: Radim Krčmář

    Roman Kagan
     
  • Adds another flag bit (bit 2) to MSR_KVM_ASYNC_PF_EN. If bit 2 is 1,
    async page faults are delivered to L1 as #PF vmexits; if bit 2 is 0,
    kvm_can_do_async_pf returns 0 if in guest mode.

    This is similar to what svm.c wanted to do all along, but it is only
    enabled for Linux as L1 hypervisor. Foreign hypervisors must never
    receive async page faults as vmexits, because they'd probably be very
    confused about that.

    Cc: Paolo Bonzini
    Cc: Radim Krčmář
    Signed-off-by: Wanpeng Li
    Signed-off-by: Radim Krčmář

    Wanpeng Li
     
  • Add an nested_apf field to vcpu->arch.exception to identify an async page
    fault, and constructs the expected vm-exit information fields. Force a
    nested VM exit from nested_vmx_check_exception() if the injected #PF is
    async page fault.

    Cc: Paolo Bonzini
    Cc: Radim Krčmář
    Signed-off-by: Wanpeng Li
    Signed-off-by: Radim Krčmář

    Wanpeng Li
     
  • This patch adds the L1 guest async page fault #PF vmexit handler, such
    by L1 similar to ordinary async page fault.

    Cc: Paolo Bonzini
    Cc: Radim Krčmář
    Signed-off-by: Wanpeng Li
    [Passed insn parameters to kvm_mmu_page_fault().]
    Signed-off-by: Radim Krčmář

    Wanpeng Li
     
  • This patch removes all arguments except the first in
    kvm_x86_ops->queue_exception since they can extract the arguments from
    vcpu->arch.exception themselves.

    Cc: Paolo Bonzini
    Cc: Radim Krčmář
    Signed-off-by: Wanpeng Li
    Signed-off-by: Radim Krčmář

    Wanpeng Li
     
  • Pull more Kbuild updates from Masahiro Yamada:

    - Move generic-y of exported headers to uapi/asm/Kbuild for complete
    de-coupling of UAPI

    - Clean up scripts/Makefile.headersinst

    - Fix host programs for 32 bit machine with XFS file system

    * tag 'kbuild-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (29 commits)
    kbuild: Enable Large File Support for hostprogs
    kbuild: remove wrapper files handling from Makefile.headersinst
    kbuild: split exported generic header creation into uapi-asm-generic
    kbuild: do not include old-kbuild-file from Makefile.headersinst
    xtensa: move generic-y of exported headers to uapi/asm/Kbuild
    unicore32: move generic-y of exported headers to uapi/asm/Kbuild
    tile: move generic-y of exported headers to uapi/asm/Kbuild
    sparc: move generic-y of exported headers to uapi/asm/Kbuild
    sh: move generic-y of exported headers to uapi/asm/Kbuild
    parisc: move generic-y of exported headers to uapi/asm/Kbuild
    openrisc: move generic-y of exported headers to uapi/asm/Kbuild
    nios2: move generic-y of exported headers to uapi/asm/Kbuild
    nios2: remove unneeded arch/nios2/include/(generated/)asm/signal.h
    microblaze: move generic-y of exported headers to uapi/asm/Kbuild
    metag: move generic-y of exported headers to uapi/asm/Kbuild
    m68k: move generic-y of exported headers to uapi/asm/Kbuild
    m32r: move generic-y of exported headers to uapi/asm/Kbuild
    ia64: remove redundant generic-y += kvm_para.h from asm/Kbuild
    hexagon: move generic-y of exported headers to uapi/asm/Kbuild
    h8300: move generic-y of exported headers to uapi/asm/Kbuild
    ...

    Linus Torvalds
     

13 Jul, 2017

25 commits

  • There is a flaw in the Hyper-V SynIC implementation in KVM: when message
    page or event flags page is enabled by setting the corresponding msr,
    KVM zeroes it out. This is problematic because on migration the
    corresponding MSRs are loaded on the destination, so the content of
    those pages is lost.

    This went unnoticed so far because the only user of those pages was
    in-KVM hyperv synic timers, which could continue working despite that
    zeroing.

    Newer QEMU uses those pages for Hyper-V VMBus implementation, and
    zeroing them breaks the migration.

    Besides, in newer QEMU the content of those pages is fully managed by
    QEMU, so zeroing them is undesirable even when writing the MSRs from the
    guest side.

    To support this new scheme, introduce a new capability,
    KVM_CAP_HYPERV_SYNIC2, which, when enabled, makes sure that the synic
    pages aren't zeroed out in KVM.

    Signed-off-by: Roman Kagan
    Signed-off-by: Radim Krčmář

    Roman Kagan
     
  • The backwards_tsc_observed global introduced in commit 16a9602 is never
    reset to false. If a VM happens to be running while the host is suspended
    (a common source of the TSC jumping backwards), master clock will never
    be enabled again for any VM. In contrast, if no VM is running while the
    host is suspended, master clock is unaffected. This is inconsistent and
    unnecessarily strict. Let's track the backwards_tsc_observed variable
    separately and let each VM start with a clean slate.

    Real world impact: My Windows VMs get slower after my laptop undergoes a
    suspend/resume cycle. The only way to get the perf back is unloading and
    reloading the kvm module.

    Signed-off-by: Ladi Prosek
    Signed-off-by: Radim Krčmář

    Ladi Prosek
     
  • Make the code like the rest of the kernel.

    Link: http://lkml.kernel.org/r/1cd3d401626e51ea0e2333a860e76e80bc560a4c.1499284835.git.joe@perches.com
    Signed-off-by: Joe Perches
    Cc: Matt Fleming
    Cc: Ard Biesheuvel
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Make the code like the rest of the kernel.

    Link: http://lkml.kernel.org/r/f81bb2a67a97b1fd8b6ea99bd350d8a0f6864fb1.1499284835.git.joe@perches.com
    Signed-off-by: Joe Perches
    Cc: Yoshinori Sato
    Cc: Rich Felker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Make the code like the rest of the kernel.

    Link: http://lkml.kernel.org/r/756d3fb543e981b9284e756fa27616725a354b28.1499284835.git.joe@perches.com
    Signed-off-by: Joe Perches
    Cc: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Make the code like the rest of the kernel.

    Link: http://lkml.kernel.org/r/14db9c166d5b68efa77e337cfe49bb9b29bca3f7.1499284835.git.joe@perches.com
    Signed-off-by: Joe Perches
    Acked-by: Greg Ungerer
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Make the use of inline like the rest of the kernel.

    Link: http://lkml.kernel.org/r/f42b2202bd0d4e7ccf79ce5348bb255a035e67bb.1499284835.git.joe@perches.com
    Signed-off-by: Joe Perches
    Cc: Tony Luck
    Cc: Fenghua Yu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Make the use of inline like the rest of the kernel.

    Link: http://lkml.kernel.org/r/d47074493af80ce12590340294bc49618165c30d.1499284835.git.joe@perches.com
    Signed-off-by: Joe Perches
    Cc: Tony Luck
    Cc: Fenghua Yu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Make the use of asmlinkage like the rest of the kernel.

    Link: http://lkml.kernel.org/r/efb2dfed4d9315bf68ec0334c81b65af176a0174.1499284835.git.joe@perches.com
    Signed-off-by: Joe Perches
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Move inline to be like the rest of the kernel.

    Link: http://lkml.kernel.org/r/6bf1bec049897c4158f698b866810f47c728f233.1499284835.git.joe@perches.com
    Signed-off-by: Joe Perches
    Cc: Mikael Starvik
    Cc: Jesper Nilsson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Convert 'u8 inline' to 'inline u8' to be the same style used by the rest
    of the kernel.

    Miscellanea:

    jornada_ssp_reverse is an odd function.
    It is declared inline but is also EXPORT_SYMBOL.
    It is also apparently only used by jornada720_ssp.c
    Likely the EXPORT_SYMBOL could be removed and the function
    converted to static.

    The addition of static and removal of EXPORT_SYMBOL was not done.

    Link: http://lkml.kernel.org/r/5bd3b2bf39c6c9caf773949f18158f8f5ec08582.1499284835.git.joe@perches.com
    Signed-off-by: Joe Perches
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • asmlinkage is either 'extern "C"' or blank.

    Move the uses of asmlinkage before the return types to be similar
    to the rest of the kernel.

    Link: http://lkml.kernel.org/r/005b8e120650c6a13b541e420f4e3605603fe9e6.1499284835.git.joe@perches.com
    Signed-off-by: Joe Perches
    Cc: Christoffer Dall
    Cc: Marc Zyngier
    Cc: Paolo Bonzini
    Cc: Radim Krcmar
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
    the page allocator. This has been true but only for allocations
    requests larger than PAGE_ALLOC_COSTLY_ORDER. It has been always
    ignored for smaller sizes. This is a bit unfortunate because there is
    no way to express the same semantic for those requests and they are
    considered too important to fail so they might end up looping in the
    page allocator for ever, similarly to GFP_NOFAIL requests.

    Now that the whole tree has been cleaned up and accidental or misled
    usage of __GFP_REPEAT flag has been removed for !costly requests we can
    give the original flag a better name and more importantly a more useful
    semantic. Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
    that the allocator would try really hard but there is no promise of a
    success. This will work independent of the order and overrides the
    default allocator behavior. Page allocator users have several levels of
    guarantee vs. cost options (take GFP_KERNEL as an example)

    - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
    attempt to free memory at all. The most light weight mode which even
    doesn't kick the background reclaim. Should be used carefully because
    it might deplete the memory and the next user might hit the more
    aggressive reclaim

    - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
    allocation without any attempt to free memory from the current
    context but can wake kswapd to reclaim memory if the zone is below
    the low watermark. Can be used from either atomic contexts or when
    the request is a performance optimization and there is another
    fallback for a slow path.

    - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
    non sleeping allocation with an expensive fallback so it can access
    some portion of memory reserves. Usually used from interrupt/bh
    context with an expensive slow path fallback.

    - GFP_KERNEL - both background and direct reclaim are allowed and the
    _default_ page allocator behavior is used. That means that !costly
    allocation requests are basically nofail but there is no guarantee of
    that behavior so failures have to be checked properly by callers
    (e.g. OOM killer victim is allowed to fail currently).

    - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
    and all allocation requests fail early rather than cause disruptive
    reclaim (one round of reclaim in this implementation). The OOM killer
    is not invoked.

    - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
    behavior and all allocation requests try really hard. The request
    will fail if the reclaim cannot make any progress. The OOM killer
    won't be triggered.

    - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
    and all allocation requests will loop endlessly until they succeed.
    This might be really dangerous especially for larger orders.

    Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
    because they already had their semantic. No new users are added.
    __alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
    there is no progress and we have already passed the OOM point.

    This means that all the reclaim opportunities have been exhausted except
    the most disruptive one (the OOM killer) and a user defined fallback
    behavior is more sensible than keep retrying in the page allocator.

    [akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
    [mhocko@suse.com: semantic fix]
    Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
    [mhocko@kernel.org: address other thing spotted by Vlastimil]
    Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
    Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Alex Belits
    Cc: Chris Wilson
    Cc: Christoph Hellwig
    Cc: Darrick J. Wong
    Cc: David Daney
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: NeilBrown
    Cc: Ralf Baechle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Patch series "mm: give __GFP_REPEAT a better semantic".

    The main motivation for the change is that the current implementation of
    __GFP_REPEAT is not very much useful.

    The documentation says:
    * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt
    * _might_ fail. This depends upon the particular VM implementation.

    It just fails to mention that this is true only for large (costly) high
    order which has been the case since the flag was introduced. A similar
    semantic would be really helpful for smal orders as well, though,
    because we have places where a failure with a specific fallback error
    handling is preferred to a potential endless loop inside the page
    allocator.

    The earlier cleanup dropped __GFP_REPEAT usage for low (!costly) order
    users so only those which might use larger orders have stayed. One new
    user added in the meantime is addressed in patch 1.

    Let's rename the flag to something more verbose and use it for existing
    users. Semantic for those will not change. Then implement low
    (!costly) orders failure path which is hit after the page allocator is
    about to invoke the oom killer. With that we have a good counterpart
    for __GFP_NORETRY and finally can tell try as hard as possible without
    the OOM killer.

    Xfs code already has an existing annotation for allocations which are
    allowed to fail and we can trivially map them to the new gfp flag
    because it will provide the semantic KM_MAYFAIL wants. Christoph didn't
    consider the new flag really necessary but didn't respond to the OOM
    killer aspect of the change so I have kept the patch. If this is still
    seen as not really needed I can drop the patch.

    kvmalloc will allow also !costly high order allocations to retry hard
    before falling back to the vmalloc.

    drm/i915 asked for the new semantic explicitly.

    Memory migration code, especially for the memory hotplug, should back
    off rather than invoking the OOM killer as well.

    This patch (of 6):

    Commit 3377e227af44 ("MIPS: Add 48-bit VA space (and 4-level page
    tables) for 4K pages.") has added a new __GFP_REPEAT user but using this
    flag doesn't really make any sense for order-0 request which is the case
    here because PUD_ORDER is 0. __GFP_REPEAT has historically effect only
    on allocation requests with order > PAGE_ALLOC_COSTLY_ORDER.

    This doesn't introduce any functional change. This is a preparatory
    patch for later work which renames the flag and redefines its semantic.

    Link: http://lkml.kernel.org/r/20170623085345.11304-2-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Alex Belits
    Cc: David Daney
    Cc: Ralf Baechle
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: NeilBrown
    Cc: Christoph Hellwig
    Cc: Chris Wilson
    Cc: Darrick J. Wong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • When RLIMIT_STACK is, for example, 256MB, the current code results in a
    gap between the top of the task and mmap_base of 256MB, failing to take
    into account the amount by which the stack address was randomized. In
    other words, the stack gets less than RLIMIT_STACK space.

    Ensure that the gap between the stack and mmap_base always takes stack
    randomization and the stack guard gap into account.

    Inspired by Daniel Micay's linux-hardened tree.

    Link: http://lkml.kernel.org/r/20170622200033.25714-4-riel@redhat.com
    Signed-off-by: Rik van Riel
    Reported-by: Florian Weimer
    Cc: Ingo Molnar
    Cc: Will Deacon
    Cc: Daniel Micay
    Cc: Benjamin Herrenschmidt
    Cc: Hugh Dickins
    Cc: Catalin Marinas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • When RLIMIT_STACK is, for example, 256MB, the current code results in a
    gap between the top of the task and mmap_base of 256MB, failing to take
    into account the amount by which the stack address was randomized. In
    other words, the stack gets less than RLIMIT_STACK space.

    Ensure that the gap between the stack and mmap_base always takes stack
    randomization and the stack guard gap into account.

    Obtained from Daniel Micay's linux-hardened tree.

    Link: http://lkml.kernel.org/r/20170622200033.25714-3-riel@redhat.com
    Signed-off-by: Daniel Micay
    Signed-off-by: Rik van Riel
    Reported-by: Florian Weimer
    Cc: Ingo Molnar
    Cc: Will Deacon
    Cc: Daniel Micay
    Cc: Benjamin Herrenschmidt
    Cc: Hugh Dickins
    Cc: Catalin Marinas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • When RLIMIT_STACK is, for example, 256MB, the current code results in a
    gap between the top of the task and mmap_base of 256MB, failing to take
    into account the amount by which the stack address was randomized. In
    other words, the stack gets less than RLIMIT_STACK space.

    Ensure that the gap between the stack and mmap_base always takes stack
    randomization and the stack guard gap into account.

    Obtained from Daniel Micay's linux-hardened tree.

    Link: http://lkml.kernel.org/r/20170622200033.25714-2-riel@redhat.com
    Signed-off-by: Daniel Micay
    Signed-off-by: Rik van Riel
    Reported-by: Florian Weimer
    Acked-by: Ingo Molnar
    Cc: Will Deacon
    Cc: Daniel Micay
    Cc: Benjamin Herrenschmidt
    Cc: Hugh Dickins
    Cc: Catalin Marinas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Use the ascii-armor canary to prevent unterminated C string overflows
    from being able to successfully overwrite the canary, even if they
    somehow obtain the canary value.

    Inspired by execshield ascii-armor and Daniel Micay's linux-hardened
    tree.

    Link: http://lkml.kernel.org/r/20170524123446.78510066@annuminas.surriel.com
    Signed-off-by: Rik van Riel
    Acked-by: Kees Cook
    Cc: Daniel Micay
    Cc: "Theodore Ts'o"
    Cc: H. Peter Anvin
    Cc: Andy Lutomirski
    Cc: Ingo Molnar
    Cc: Catalin Marinas
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Use the ascii-armor canary to prevent unterminated C string overflows
    from being able to successfully overwrite the canary, even if they
    somehow obtain the canary value.

    Inspired by execshield ascii-armor and Daniel Micay's linux-hardened
    tree.

    Link: http://lkml.kernel.org/r/20170524155751.424-5-riel@redhat.com
    Signed-off-by: Rik van Riel
    Acked-by: Kees Cook
    Cc: Daniel Micay
    Cc: "Theodore Ts'o"
    Cc: H. Peter Anvin
    Cc: Andy Lutomirski
    Cc: Ingo Molnar
    Cc: Catalin Marinas
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • Use the ascii-armor canary to prevent unterminated C string overflows
    from being able to successfully overwrite the canary, even if they
    somehow obtain the canary value.

    Inspired by execshield ascii-armor and Daniel Micay's linux-hardened
    tree.

    Link: http://lkml.kernel.org/r/20170524155751.424-4-riel@redhat.com
    Signed-off-by: Rik van Riel
    Acked-by: Kees Cook
    Cc: Daniel Micay
    Cc: "Theodore Ts'o"
    Cc: H. Peter Anvin
    Cc: Andy Lutomirski
    Cc: Ingo Molnar
    Cc: Catalin Marinas
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • When building the sh architecture, the compiler doesn't realize that
    BUG() doesn't return, so it will complain about functions using BUG()
    that are marked with the noreturn attribute:

    lib/string.c: In function 'fortify_panic':
    >> lib/string.c:986:1: warning: 'noreturn' function does return
    }
    ^

    Link: http://lkml.kernel.org/r/20170627192050.GA66784@beast
    Signed-off-by: Kees Cook
    Cc: Yoshinori Sato
    Cc: Rich Felker
    Cc: Daniel Micay
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • This adds support for compiling with a rough equivalent to the glibc
    _FORTIFY_SOURCE=1 feature, providing compile-time and runtime buffer
    overflow checks for string.h functions when the compiler determines the
    size of the source or destination buffer at compile-time. Unlike glibc,
    it covers buffer reads in addition to writes.

    GNU C __builtin_*_chk intrinsics are avoided because they would force a
    much more complex implementation. They aren't designed to detect read
    overflows and offer no real benefit when using an implementation based
    on inline checks. Inline checks don't add up to much code size and
    allow full use of the regular string intrinsics while avoiding the need
    for a bunch of _chk functions and per-arch assembly to avoid wrapper
    overhead.

    This detects various overflows at compile-time in various drivers and
    some non-x86 core kernel code. There will likely be issues caught in
    regular use at runtime too.

    Future improvements left out of initial implementation for simplicity,
    as it's all quite optional and can be done incrementally:

    * Some of the fortified string functions (strncpy, strcat), don't yet
    place a limit on reads from the source based on __builtin_object_size of
    the source buffer.

    * Extending coverage to more string functions like strlcat.

    * It should be possible to optionally use __builtin_object_size(x, 1) for
    some functions (C strings) to detect intra-object overflows (like
    glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative
    approach to avoid likely compatibility issues.

    * The compile-time checks should be made available via a separate config
    option which can be enabled by default (or always enabled) once enough
    time has passed to get the issues it catches fixed.

    Kees said:
    "This is great to have. While it was out-of-tree code, it would have
    blocked at least CVE-2016-3858 from being exploitable (improper size
    argument to strlcpy()). I've sent a number of fixes for
    out-of-bounds-reads that this detected upstream already"

    [arnd@arndb.de: x86: fix fortified memcpy]
    Link: http://lkml.kernel.org/r/20170627150047.660360-1-arnd@arndb.de
    [keescook@chromium.org: avoid panic() in favor of BUG()]
    Link: http://lkml.kernel.org/r/20170626235122.GA25261@beast
    [keescook@chromium.org: move from -mm, add ARCH_HAS_FORTIFY_SOURCE, tweak Kconfig help]
    Link: http://lkml.kernel.org/r/20170526095404.20439-1-danielmicay@gmail.com
    Link: http://lkml.kernel.org/r/1497903987-21002-8-git-send-email-keescook@chromium.org
    Signed-off-by: Daniel Micay
    Signed-off-by: Kees Cook
    Signed-off-by: Arnd Bergmann
    Acked-by: Kees Cook
    Cc: Mark Rutland
    Cc: Daniel Axtens
    Cc: Rasmus Villemoes
    Cc: Andy Shevchenko
    Cc: Chris Metcalf
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Micay
     
  • Testing the fortified string functions[1] would cause a kernel panic on
    boot in test_feature_fixups() due to a buffer overflow in memcmp.

    This boils down to things like this:

    extern unsigned int ftr_fixup_test1;
    extern unsigned int ftr_fixup_test1_orig;

    check(memcmp(&ftr_fixup_test1, &ftr_fixup_test1_orig, size) == 0);

    We know that these are asm labels so it is safe to read up to 'size'
    bytes at those addresses.

    However, because we have passed the address of a single unsigned int to
    memcmp, the compiler believes the underlying object is in fact a single
    unsigned int. So if size > sizeof(unsigned int), there will be a panic
    at runtime.

    We can fix this by changing the types: instead of calling the asm labels
    unsigned ints, call them unsigned int[]s. Therefore the size isn't
    incorrectly determined at compile time and we get a regular unsafe
    memcmp and no panic.

    [1] http://openwall.com/lists/kernel-hardening/2017/05/09/2

    Link: http://lkml.kernel.org/r/1497903987-21002-7-git-send-email-keescook@chromium.org
    Signed-off-by: Daniel Axtens
    Signed-off-by: Kees Cook
    Suggested-by: Michael Ellerman
    Tested-by: Andrew Donnellan
    Reviewed-by: Andrew Donnellan
    Cc: Kees Cook
    Cc: Daniel Micay
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Axtens
     
  • prom_init is a bit special; in theory it should be able to be linked
    separately to the kernel. To keep this from getting too complex, the
    symbols that prom_init.c uses are checked.

    Fortification adds symbols, and it gets quite messy as it includes
    things like panic(). So just don't fortify prom_init.c for now.

    Link: http://lkml.kernel.org/r/1497903987-21002-6-git-send-email-keescook@chromium.org
    Signed-off-by: Daniel Axtens
    Signed-off-by: Kees Cook
    Acked-by: Michael Ellerman
    Cc: Daniel Micay
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Axtens
     
  • Implement an arch-speicfic watchdog rather than use the perf-based
    hardlockup detector.

    The new watchdog takes the soft-NMI directly, rather than going through
    perf. Perf interrupts are to be made maskable in future, so that would
    prevent the perf detector from working in those regions.

    Additionally, implement a SMP based detector where all CPUs watch one
    another by pinging a shared cpumask. This is because powerpc Book3S
    does not have a true periodic local NMI, but some platforms do implement
    a true NMI IPI.

    If a CPU is stuck with interrupts hard disabled, the soft-NMI watchdog
    does not work, but the SMP watchdog will. Even on platforms without a
    true NMI IPI to get a good trace from the stuck CPU, other CPUs will
    notice the lockup sufficiently to report it and panic.

    [npiggin@gmail.com: honor watchdog disable at boot/hotplug]
    Link: http://lkml.kernel.org/r/20170621001346.5bb337c9@roar.ozlabs.ibm.com
    [npiggin@gmail.com: fix false positive warning at CPU unplug]
    Link: http://lkml.kernel.org/r/20170630080740.20766-1-npiggin@gmail.com
    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20170616065715.18390-6-npiggin@gmail.com
    Signed-off-by: Nicholas Piggin
    Reviewed-by: Don Zickus
    Tested-by: Babu Moger [sparc]
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicholas Piggin