18 Aug, 2016

1 commit


16 Aug, 2016

2 commits

  • rtree_next_node() walks the linked list of leaf nodes to find the next
    block of pages in the struct memory_bitmap. If it walks off the end of
    the list of nodes, it walks the list of memory zones to find the next
    region of memory. If it walks off the end of the list of zones, it
    returns false.

    This leaves the struct bm_position's node and zone pointers pointing
    at their respective struct list_heads in struct mem_zone_bm_rtree.

    memory_bm_find_bit() uses struct bm_position's node and zone pointers
    to avoid walking lists and trees if the next bit appears in the same
    node/zone. It handles these values being stale.

    Swap rtree_next_node()s 'step then test' to 'test-next then step',
    this means if we reach the end of memory we return false and leave
    the node and zone pointers as they were.

    This fixes a panic on resume using AMD Seattle with 64K pages:
    [ 6.868732] Freezing user space processes ... (elapsed 0.000 seconds) done.
    [ 6.875753] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds)
    [ 6.896453] PM: Using 3 thread(s) for decompression.
    [ 6.896453] PM: Loading and decompressing image data (5339 pages)...
    [ 7.318890] PM: Image loading progress: 0%
    [ 7.323395] Unable to handle kernel paging request at virtual address 00800040
    [ 7.330611] pgd = ffff000008df0000
    [ 7.334003] [00800040] *pgd=00000083fffe0003, *pud=00000083fffe0003, *pmd=00000083fffd0003, *pte=0000000000000000
    [ 7.344266] Internal error: Oops: 96000005 [#1] PREEMPT SMP
    [ 7.349825] Modules linked in:
    [ 7.352871] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G W I 4.8.0-rc1 #4737
    [ 7.360512] Hardware name: AMD Overdrive/Supercharger/Default string, BIOS ROD1002C 04/08/2016
    [ 7.369109] task: ffff8003c0220000 task.stack: ffff8003c0280000
    [ 7.375020] PC is at set_bit+0x18/0x30
    [ 7.378758] LR is at memory_bm_set_bit+0x24/0x30
    [ 7.383362] pc : [] lr : [] pstate: 60000045
    [ 7.390743] sp : ffff8003c0283b00
    [ 7.473551]
    [ 7.475031] Process swapper/0 (pid: 1, stack limit = 0xffff8003c0280020)
    [ 7.481718] Stack: (0xffff8003c0283b00 to 0xffff8003c0284000)
    [ 7.800075] Call trace:
    [ 7.887097] [] set_bit+0x18/0x30
    [ 7.891876] [] duplicate_memory_bitmap.constprop.38+0x54/0x70
    [ 7.899172] [] snapshot_write_next+0x22c/0x47c
    [ 7.905166] [] load_image_lzo+0x754/0xa88
    [ 7.910725] [] swsusp_read+0x144/0x230
    [ 7.916025] [] load_image_and_restore+0x58/0x90
    [ 7.922105] [] software_resume+0x2f0/0x338
    [ 7.927752] [] do_one_initcall+0x38/0x11c
    [ 7.933314] [] kernel_init_freeable+0x14c/0x1ec
    [ 7.939395] [] kernel_init+0x10/0xfc
    [ 7.944520] [] ret_from_fork+0x10/0x40
    [ 7.949820] Code: d2800022 8b400c21 f9800031 9ac32043 (c85f7c22)
    [ 7.955909] ---[ end trace 0024a5986e6ff323 ]---
    [ 7.960529] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

    Here struct mem_zone_bm_rtree's start_pfn has been returned instead of
    struct rtree_node's addr as the node/zone pointers are corrupt after
    we walked off the end of the lists during mark_unsafe_pages().

    This behaviour was exposed by commit 6dbecfd345a6 ("PM / hibernate:
    Simplify mark_unsafe_pages()"), which caused mark_unsafe_pages() to call
    duplicate_memory_bitmap(), which uses memory_bm_find_bit() after walking
    off the end of the memory bitmap.

    Fixes: 3a20cb177961 (PM / Hibernate: Implement position keeping in radix tree)
    Signed-off-by: James Morse
    [ rjw: Subject ]
    Signed-off-by: Rafael J. Wysocki

    James Morse
     
  • The value of temp_level4_pgt is the physical address of the
    top-level page directory, so use __pa() to compute it.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Ingo Molnar

    Rafael J. Wysocki
     

15 Aug, 2016

3 commits

  • Linus Torvalds
     
  • Pull thermal updates from Zhang Rui:

    - Fix a race condition when updating cooling device, which may lead to
    a situation where a thermal governor never updates the cooling
    device. From Michele Di Giorgio.

    - Fix a zero division error when disabling the forced idle injection
    from the intel powerclamp. From Petr Mladek.

    - Add suspend/resume callback for intel_pch_thermal thermal driver.
    From Srinivas Pandruvada.

    - Another two fixes for clocking cooling driver and hwmon sysfs I/F.
    From Michele Di Giorgio and Kuninori Morimoto.

    [ Hmm. That suspend/resume callback for intel_pch_thermal doesn't look
    like a fix, but I'm letting it slide.. - Linus ]

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
    thermal: clock_cooling: Fix missing mutex_init()
    thermal: hwmon: EXPORT_SYMBOL_GPL for thermal hwmon sysfs
    thermal: fix race condition when updating cooling device
    thermal/powerclamp: Prevent division by zero when counting interval
    thermal: intel_pch_thermal: Add suspend/resume callback

    Linus Torvalds
     
  • Pull m68knommu fix from Greg Ungerer:
    "This contains only a single fix for a register corruption problem on
    certain types of m68k flat format binaries"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
    m68knommu: fix user a5 register being overwritten

    Linus Torvalds
     

14 Aug, 2016

4 commits

  • …/groeck/linux-staging

    Pull h8300 and unicore32 architecture fixes from Guenter Roeck:
    "Two patches to fix h8300 and unicore32 builds.

    unicore32 builds have been broken since v4.6. The fix has been
    available in -next since March of this year.

    h8300 builds have been broken since the last commit window. The fix
    has been available in -next since June of this year"

    * tag 'fixes-for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
    h8300: Add missing include file to asm/io.h
    unicore32: mm: Add missing parameter to arch_vma_access_permitted

    Linus Torvalds
     
  • Pull arm64 fixes from Catalin Marinas:

    - support for nr_cpus= command line argument (maxcpus was previously
    changed to allow secondary CPUs to be hot-plugged)

    - ARM PMU interrupt handling fix

    - fix potential TLB conflict in the hibernate code

    - improved handling of EL1 instruction aborts (better error reporting)

    - removal of useless jprobes code for stack saving/restoring

    - defconfig updates

    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
    arm64: defconfig: enable CONFIG_LOCALVERSION_AUTO
    arm64: defconfig: add options for virtualization and containers
    arm64: hibernate: handle allocation failures
    arm64: hibernate: avoid potential TLB conflict
    arm64: Handle el1 synchronous instruction aborts cleanly
    arm64: Remove stack duplicating code from jprobes
    drivers/perf: arm-pmu: Fix handling of SPI lacking "interrupt-affinity" property
    drivers/perf: arm-pmu: convert arm_pmu_mutex to spinlock
    arm64: Support hard limit of cpu count by nr_cpus

    Linus Torvalds
     
  • Pull KVM fixes from Radim Krčmář:
    "KVM:
    - lock kvm_device list to prevent corruption on device creation.

    PPC:
    - split debugfs initialization from creation of the xics device to
    unlock the newly taken kvm lock earlier.

    s390:
    - prevent userspace from triggering two WARN_ON_ONCE.

    MIPS:
    - fix several issues in the management of TLB faults (Cc: stable)"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    MIPS: KVM: Propagate kseg0/mapped tlb fault errors
    MIPS: KVM: Fix gfn range check in kseg0 tlb faults
    MIPS: KVM: Add missing gfn range check
    MIPS: KVM: Fix mapped fault broken commpage handling
    KVM: Protect device ops->create and list_add with kvm->lock
    KVM: PPC: Move xics_debugfs_init out of create
    KVM: s390: reset KVM_REQ_MMU_RELOAD if mapping the prefix failed
    KVM: s390: set the prefix initially properly

    Linus Torvalds
     
  • Pull block fixes from Jens Axboe:

    - an NVMe fix from Gabriel, fixing a suspend/resume issue on some
    setups

    - addition of a few missing entries in the block queue sysfs
    documentation, from Joe

    - a fix for a sparse shadow warning for the bvec iterator, from
    Johannes

    - a writeback deadlock involving raid issuing barriers, and not
    flushing the plug when we wakeup the flusher threads. From
    Konstantin

    - a set of patches for the NVMe target/loop/rdma code, from Roland and
    Sagi

    * 'for-linus' of git://git.kernel.dk/linux-block:
    bvec: avoid variable shadowing warning
    doc: update block/queue-sysfs.txt entries
    nvme: Suspend all queues before deletion
    mm, writeback: flush plugged IO in wakeup_flusher_threads()
    nvme-rdma: Remove unused includes
    nvme-rdma: start async event handler after reconnecting to a controller
    nvmet: Fix controller serial number inconsistency
    nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
    nvmet-rdma: Correctly handle RDMA device hot removal
    nvme-rdma: Make sure to shutdown the controller if we can
    nvme-loop: Remove duplicate call to nvme_remove_namespaces
    nvme-rdma: Free the I/O tags when we delete the controller
    nvme-rdma: Remove duplicate call to nvme_remove_namespaces
    nvme-rdma: Fix device removal handling
    nvme-rdma: Queue ns scanning after a sucessful reconnection
    nvme-rdma: Don't leak uninitialized memory in connect request private data

    Linus Torvalds
     

13 Aug, 2016

25 commits

  • h8300 builds fail with

    arch/h8300/include/asm/io.h:9:15: error: unknown type name ‘u8’
    arch/h8300/include/asm/io.h:15:15: error: unknown type name ‘u16’
    arch/h8300/include/asm/io.h:21:15: error: unknown type name ‘u32’

    and many related errors.

    Fixes: 23c82d41bdf4 ("kexec-allow-architectures-to-override-boot-mapping-fix")
    Cc: Andrew Morton
    Signed-off-by: Guenter Roeck

    Guenter Roeck
     
  • unicore32 fails to compile with the following errors.

    mm/memory.c: In function ‘__handle_mm_fault’:
    mm/memory.c:3381: error:
    too many arguments to function ‘arch_vma_access_permitted’
    mm/gup.c: In function ‘check_vma_flags’:
    mm/gup.c:456: error:
    too many arguments to function ‘arch_vma_access_permitted’
    mm/gup.c: In function ‘vma_permits_fault’:
    mm/gup.c:640: error:
    too many arguments to function ‘arch_vma_access_permitted’

    Fixes: d61172b4b695b ("mm/core, x86/mm/pkeys: Differentiate instruction fetches")
    Cc: Dave Hansen
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Guenter Roeck
    Acked-by: Guan Xuetao

    Guenter Roeck
     
  • Update some documentation related to system sleep to document new
    features and remove outdated information from it.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Reviewed-by: Chen Yu

    Rafael J. Wysocki
     
  • Pull VFIO fix from Alex Williamson:
    "Fix oops when dereferencing empty data (Alex Williamson)"

    * tag 'vfio-v4.8-rc2' of git://github.com/awilliam/linux-vfio:
    vfio/pci: Fix NULL pointer oops in error interrupt setup handling

    Linus Torvalds
     
  • Pull nfsd fixes from Bruce Fields:
    "Fixes for the dentry refcounting leak I introduced in 4.8-rc1, and for
    races in the LOCK code which appear to go back to the big nfsd state
    lock removal from 3.17"

    * tag 'nfsd-4.8-1' of git://linux-nfs.org/~bfields/linux:
    nfsd: don't return an unhashed lock stateid after taking mutex
    nfsd: Fix race between FREE_STATEID and LOCK
    nfsd: fix dentry refcounting on create

    Linus Torvalds
     
  • Pull power management fixes from Rafael Wysocki:
    "Two hibernation fixes allowing it to work with the recently added
    randomization of the kernel identity mapping base on x86-64 and one
    cpufreq driver regression fix.

    Specifics:

    - Fix the x86 identity mapping creation helpers to avoid the
    assumption that the base address of the mapping will always be
    aligned at the PGD level, as it may be aligned at the PUD level if
    address space randomization is enabled (Rafael Wysocki).

    - Fix the hibernation core to avoid executing tracing functions
    before restoring the processor state completely during resume
    (Thomas Garnier).

    - Fix a recently introduced regression in the powernv cpufreq driver
    that causes it to crash due to an out-of-bounds array access
    (Akshay Adiga)"

    * tag 'pm-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    PM / hibernate: Restore processor state before using per-CPU variables
    x86/power/64: Always create temporary identity mapping correctly
    cpufreq: powernv: Fix crash in gpstate_timer_handler()

    Linus Torvalds
     
  • Pull x86 fixes from Ingo Molnar:
    "This is bigger than usual - the reason is partly a pent-up stream of
    fixes after the merge window and partly accidental. The fixes are:

    - five patches to fix a boot failure on Andy Lutomirsky's laptop
    - four SGI UV platform fixes
    - KASAN fix
    - warning fix
    - documentation update
    - swap entry definition fix
    - pkeys fix
    - irq stats fix"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/apic/x2apic, smp/hotplug: Don't use before alloc in x2apic_cluster_probe()
    x86/efi: Allocate a trampoline if needed in efi_free_boot_services()
    x86/boot: Rework reserve_real_mode() to allow multiple tries
    x86/boot: Defer setup_real_mode() to early_initcall time
    x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly
    x86/boot: Run reserve_bios_regions() after we initialize the memory map
    x86/irq: Do not substract irq_tlb_count from irq_call_count
    x86/mm: Fix swap entry comment and macro
    x86/mm/kaslr: Fix -Wformat-security warning
    x86/mm/pkeys: Fix compact mode by removing protection keys' XSAVE buffer manipulation
    x86/build: Reduce the W=1 warnings noise when compiling x86 syscall tables
    x86/platform/UV: Fix kernel panic running RHEL kdump kernel on UV systems
    x86/platform/UV: Fix problem with UV4 BIOS providing incorrect PXM values
    x86/platform/UV: Fix bug with iounmap() of the UV4 EFI System Table causing a crash
    x86/platform/UV: Fix problem with UV4 Socket IDs not being contiguous
    x86/entry: Clarify the RF saving/restoring situation with SYSCALL/SYSRET
    x86/mm: Disable preemption during CR3 read+write
    x86/mm/KASLR: Increase BRK pages for KASLR memory randomization
    x86/mm/KASLR: Fix physical memory calculation on KASLR memory randomization
    x86, kasan, ftrace: Put APIC interrupt handlers into .irqentry.text

    Linus Torvalds
     
  • Pull timer fixes from Ingo Molnar:
    "Misc fixes: a /dev/rtc regression fix, two APIC timer period
    calibration fixes, an ARM clocksource driver fix and a NOHZ
    power use regression fix"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/hpet: Fix /dev/rtc breakage caused by RTC cleanup
    x86/timers/apic: Inform TSC deadline clockevent device about recalibration
    x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents frequency roundoff error
    timers: Fix get_next_timer_interrupt() computation
    clocksource/arm_arch_timer: Force per-CPU interrupt to be level-triggered

    Linus Torvalds
     
  • * pm-sleep:
    PM / hibernate: Restore processor state before using per-CPU variables
    x86/power/64: Always create temporary identity mapping correctly

    * pm-cpufreq:
    cpufreq: powernv: Fix crash in gpstate_timer_handler()

    Rafael J. Wysocki
     
  • Pull scheduler fixes from Ingo Molnar:
    "Misc fixes: cputime fixes, two deadline scheduler fixes and a cgroups
    scheduling fix"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/cputime: Fix omitted ticks passed in parameter
    sched/cputime: Fix steal time accounting
    sched/deadline: Fix lock pinning warning during CPU hotplug
    sched/cputime: Mitigate performance regression in times()/clock_gettime()
    sched/fair: Fix typo in sync_throttle()
    sched/deadline: Fix wrap-around in DL heap

    Linus Torvalds
     
  • Restore the processor state before calling any other functions to
    ensure per-CPU variables can be used with KASLR memory randomization.

    Tracing functions use per-CPU variables (GS based on x86) and one was
    called just before restoring the processor state fully. It resulted
    in a double fault when both the tracing & the exception handler
    functions tried to use a per-CPU variable.

    Fixes: bb3632c6101b (PM / sleep: trace events for suspend/resume)
    Reported-and-tested-by: Borislav Petkov
    Reported-by: Jiri Kosina
    Tested-by: Rafael J. Wysocki
    Tested-by: Jiri Kosina
    Signed-off-by: Thomas Garnier
    Acked-by: Pavel Machek
    Signed-off-by: Rafael J. Wysocki

    Thomas Garnier
     
  • Pull perf fixes from Ingo Molnar:
    "Mostly tooling fixes, plus two uncore-PMU fixes, an uprobes fix, a
    perf-cgroups fix and an AUX events fix"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/x86/intel/uncore: Add enable_box for client MSR uncore
    perf/x86/intel/uncore: Fix uncore num_counters
    uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions
    perf/core: Set cgroup in CPU contexts for new cgroup events
    perf/core: Fix sideband list-iteration vs. event ordering NULL pointer deference crash
    perf probe ppc64le: Fix probe location when using DWARF
    perf probe: Add function to post process kernel trace events
    tools: Sync cpufeatures headers with the kernel
    toops: Sync tools/include/uapi/linux/bpf.h with the kernel
    tools: Sync cpufeatures.h and vmx.h with the kernel
    perf probe: Support signedness casting
    perf stat: Avoid skew when reading events
    perf probe: Fix module name matching
    perf probe: Adjust map->reloc offset when finding kernel symbol from map
    perf hists: Trim libtraceevent trace_seq buffers
    perf script: Add 'bpf-output' field to usage message

    Linus Torvalds
     
  • nfsd4_lock will take the st_mutex before working with the stateid it
    gets, but between the time when we drop the cl_lock and take the mutex,
    the stateid could become unhashed (a'la FREE_STATEID). If that happens
    the lock stateid returned to the client will be forgotten.

    Fix this by first moving the st_mutex acquisition into
    lookup_or_create_lock_state. Then, have it check to see if the lock
    stateid is still hashed after taking the mutex. If it's not, then put
    the stateid and try the find/create again.

    Signed-off-by: Jeff Layton
    Tested-by: Alexey Kodanev
    Cc: stable@vger.kernel.org # feb9dad5 nfsd: Always lock state exclusively.
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     
  • Pull locking fixes from Ingo Molnar:
    "Misc fixes: lockstat fix, futex fix on !MMU systems, big endian fix
    for qrwlocks and a race fix for pvqspinlocks"

    * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    locking/pvqspinlock: Fix a bug in qstat_read()
    locking/pvqspinlock: Fix double hash race
    locking/qrwlock: Fix write unlock bug on big endian systems
    futex: Assume all mappings are private on !MMU systems

    Linus Torvalds
     
  • Pull irq fix from Ingo Molnar:
    "A fix for an MSI regression"

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    genirq/msi: Make sure PCI MSIs are activated early

    Linus Torvalds
     
  • Pull EFI fixes from Ingo Molnar:
    "A fix for EFI capsules and an SGI UV platform fix"

    * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    efi/capsule: Allocate whole capsule into virtual memory
    x86/platform/uv: Skip UV runtime services mapping in the efi_runtime_disabled case

    Linus Torvalds
     
  • Pull NFS client bugfixes from Trond Myklebust:
    "Highlights include:

    - Stable patch from Olga to fix RPCSEC_GSS upcalls when the same user
    needs multiple different security services (e.g. krb5i and krb5p).

    - Stable patch to fix a regression introduced by the use of
    SO_REUSEPORT, and that prevented the use of multiple different NFS
    versions to the same server.

    - TCP socket reconnection timer fixes.

    - Patch from Neil to disable the use of IPv6 temporary addresses"

    * tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFSv4: Cap the transport reconnection timer at 1/2 lease period
    NFSv4: Cleanup the setting of the nfs4 lease period
    SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout
    SUNRPC: Fix reconnection timeouts
    NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED
    SUNRPC: disable the use of IPv6 temporary addresses.
    SUNRPC: allow for upcalls for same uid but different gss service
    SUNRPC: Fix up socket autodisconnect
    SUNRPC: Handle EADDRNOTAVAIL on connection failures

    Linus Torvalds
     
  • Pull libnvdimm fixes from Dan Williams:

    - Fix for the nd_blk (NVDIMM Block Window Aperture) driver.

    A spec clarification requires the driver to mask off reserved bits in
    status register. This is tagged for -stable back to the v4.2 kernel.

    - Fix for a kernel crash in the nvdimm unit tests when module loading
    is interrupted with SIGTERM. Tagged for -stable since validation
    efforts external to Intel use the unit tests for qualifying
    backports.

    - Add a new 'size' sysfs attribute for the BTT (NVDIMM Block
    Translation Table) driver to make it symmetric with the other
    namespace personality drivers (PFN and DAX) that provide a size
    attribute for indicating how much namespace capacity is lost to
    metadata.

    The BTT change arrived at the start of the merge window and has
    appeared in a -next release. It can technically wait for 4.9, but it
    is small, fixes asymmetry in the libnvdimm-sysfs interface, and
    something I would have squeezed into the v4.8 pull request had it
    arrived a few days earlier.

    * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
    tools/testing/nvdimm: fix SIGTERM vs hotplug crash
    nvdimm, btt: add a size attribute for BTTs
    libnvdimm, nd_blk: mask off reserved status bits

    Linus Torvalds
     
  • Pull sound fixes from Takashi Iwai:
    "A regression fix of HD-audio runtime PM and two USB quirks"

    * tag 'sound-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
    ALSA: hda - Manage power well properly for resume
    ALSA: usb-audio: Add quirk for ELP HD USB Camera
    ALSA: usb-audio: Add a sample rate quirk for Creative Live! Cam Socialize HD (VF0610)

    Linus Torvalds
     
  • Pull powerpc fixes from Michael Ellerman:
    "Some powerpc fixes for 4.8:

    Misc:
    - powerpc/vdso: Fix build rules to rebuild vdsos correctly from Nicholas Piggin
    - powerpc/ptrace: Fix coredump since ptrace TM changes from Cyril Bur
    - powerpc/32: Fix csum_partial_copy_generic() from Christophe Leroy
    - cxl: Set psl_fir_cntl to production environment value from Frederic Barrat
    - powerpc/eeh: Switch to conventional PCI address output in EEH log from Guilherme G. Piccoli
    - cxl: Use fixed width predefined types in data structure. from Philippe Bergheaud
    - powerpc/vdso: Add missing include file from Guenter Roeck
    - powerpc: Fix unused function warning 'lmb_to_memblock' from Alastair D'Silva
    - powerpc/powernv/ioda: Fix TCE invalidate to work in real mode again from Alexey Kardashevskiy
    - powerpc/cell: Add missing error code in spufs_mkgang() from Dan Carpenter
    - crypto: crc32c-vpmsum - Convert to CPU feature based module autoloading from Anton Blanchard
    - powerpc/pasemi: Fix coherent_dma_mask for dma engine from Darren Stevens

    Benjamin Herrenschmidt:
    - powerpc/32: Fix crash during static key init
    - powerpc: Update obsolete comment in setup_32.c about early_init()
    - powerpc: Print the kernel load address at the end of prom_init()
    - powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs
    - powerpc/xics: Properly set Edge/Level type and enable resend

    Mahesh Salgaonkar:
    - powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
    - powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.
    - powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h
    - powerpc/powernv: Load correct TOC pointer while waking up from winkle.

    Andrew Donnellan:
    - cxl: Fix sparse warnings
    - cxl: Fix NULL dereference in cxl_context_init() on PowerVM guests

    Michael Ellerman:
    - selftests/powerpc: Specify we expect to build with std=gnu99
    - powerpc/Makefile: Use cflags-y/aflags-y for setting endian options
    - powerpc/pci: Fix endian bug in fixed PHB numbering"

    * tag 'powerpc-4.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (26 commits)
    selftests/powerpc: Specify we expect to build with std=gnu99
    powerpc/vdso: Fix build rules to rebuild vdsos correctly
    powerpc/Makefile: Use cflags-y/aflags-y for setting endian options
    powerpc/32: Fix crash during static key init
    powerpc: Update obsolete comment in setup_32.c about early_init()
    powerpc: Print the kernel load address at the end of prom_init()
    powerpc/ptrace: Fix coredump since ptrace TM changes
    powerpc/32: Fix csum_partial_copy_generic()
    cxl: Set psl_fir_cntl to production environment value
    powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs
    powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
    powerpc/pci: Fix endian bug in fixed PHB numbering
    powerpc/eeh: Switch to conventional PCI address output in EEH log
    cxl: Fix sparse warnings
    cxl: Fix NULL dereference in cxl_context_init() on PowerVM guests
    cxl: Use fixed width predefined types in data structure.
    powerpc/vdso: Add missing include file
    powerpc: Fix unused function warning 'lmb_to_memblock'
    powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.
    powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h
    ...

    Linus Torvalds
     
  • When CONFIG_LOCALVERSION_AUTO is disabled, the version string is
    just a tag name (or with a '+' appended if HEAD is not a tagged
    commit).

    During the development (and especially when git-bisecting), longer
    version string would be helpful to identify the commit we are running.

    This is a default y option, so drop the unset to enable it.

    Signed-off-by: Masahiro Yamada
    Signed-off-by: Catalin Marinas

    Masahiro Yamada
     
  • Enable options commonly needed by popular virtualization
    and container applications. Use modules when possible to
    avoid too much overhead for users not interested.

    - add namespace and cgroup options needed
    - add seccomp - optional, but enhances Qemu etc
    - bridge, nat, veth, macvtap and multicast for routing
    guests and containers
    - btfrs and overlayfs modules for container COW backends
    - while near it, make fuse a module instead of built-in.

    Generated with make saveconfig and dropping unrelated spurious
    change hunks while commiting. bloat-o-meter old-vmlinux vmlinux:

    add/remove: 905/390 grow/shrink: 767/229 up/down: 183513/-94861 (88652)
    ....
    Total: Before=10515408, After=10604060, chg +0.84%

    Signed-off-by: Riku Voipio
    Signed-off-by: Catalin Marinas

    Riku Voipio
     
  • In create_safe_exec_page(), we create a copy of the hibernate exit text,
    along with some page tables to map this via TTBR0. We then install the
    new tables in TTBR0.

    In swsusp_arch_resume() we call create_safe_exec_page() before trying a
    number of operations which may fail (e.g. copying the linear map page
    tables). If these fail, we bail out of swsusp_arch_resume() and return
    an error code, but leave TTBR0 as-is. Subsequently, the core hibernate
    code will call free_basic_memory_bitmaps(), which will free all of the
    memory allocations we made, including the page tables installed in
    TTBR0.

    Thus, we may have TTBR0 pointing at dangling freed memory for some
    period of time. If the hibernate attempt was triggered by a user
    requesting a hibernate test via the reboot syscall, we may return to
    userspace with the clobbered TTBR0 value.

    Avoid these issues by reorganising swsusp_arch_resume() such that we
    have no failure paths after create_safe_exec_page(). We also add a check
    that the zero page allocation succeeded, matching what we have for other
    allocations.

    Fixes: 82869ac57b5d ("arm64: kernel: Add support for hibernate/suspend-to-disk")
    Signed-off-by: Mark Rutland
    Acked-by: James Morse
    Cc: Lorenzo Pieralisi
    Cc: Will Deacon
    Cc: # 4.7+
    Signed-off-by: Catalin Marinas

    Mark Rutland
     
  • In create_safe_exec_page we install a set of global mappings in TTBR0,
    then subsequently invalidate TLBs. While TTBR0 points at the zero page,
    and the TLBs should be free of stale global entries, we may have stale
    ASID-tagged entries (e.g. from the EFI runtime services mappings) for
    the same VAs. Per the ARM ARM these ASID-tagged entries may conflict
    with newly-allocated global entries, and we must follow a
    Break-Before-Make approach to avoid issues resulting from this.

    This patch reworks create_safe_exec_page to invalidate TLBs while the
    zero page is still in place, ensuring that there are no potential
    conflicts when the new TTBR0 value is installed. As a single CPU is
    online while this code executes, we do not need to perform broadcast TLB
    maintenance, and can call local_flush_tlb_all(), which also subsumes
    some barriers. The remaining assembly is converted to use write_sysreg()
    and isb().

    Other than this, we safely manipulate TTBRs in the hibernate dance. The
    code we install as part of the new TTBR0 mapping (the hibernated
    kernel's swsusp_arch_suspend_exit) installs a zero page into TTBR1,
    invalidates TLBs, then installs its preferred value. Upon being restored
    to the middle of swsusp_arch_suspend, the new image will call
    __cpu_suspend_exit, which will call cpu_uninstall_idmap, installing the
    zero page in TTBR0 and invalidating all TLB entries.

    Fixes: 82869ac57b5d ("arm64: kernel: Add support for hibernate/suspend-to-disk")
    Signed-off-by: Mark Rutland
    Acked-by: James Morse
    Tested-by: James Morse
    Cc: Lorenzo Pieralisi
    Cc: Will Deacon
    Cc: # 4.7+
    Signed-off-by: Catalin Marinas

    Mark Rutland
     
  • Executing from a non-executable area gives an ugly message:

    lkdtm: Performing direct entry EXEC_RODATA
    lkdtm: attempting ok execution at ffff0000084c0e08
    lkdtm: attempting bad execution at ffff000008880700
    Bad mode in Synchronous Abort handler detected on CPU2, code 0x8400000e -- IABT (current EL)
    CPU: 2 PID: 998 Comm: sh Not tainted 4.7.0-rc2+ #13
    Hardware name: linux,dummy-virt (DT)
    task: ffff800077e35780 ti: ffff800077970000 task.ti: ffff800077970000
    PC is at lkdtm_rodata_do_nothing+0x0/0x8
    LR is at execute_location+0x74/0x88

    The 'IABT (current EL)' indicates the error but it's a bit cryptic
    without knowledge of the ARM ARM. There is also no indication of the
    specific address which triggered the fault. The increase in kernel
    page permissions makes hitting this case more likely as well.
    Handling the case in the vectors gives a much more familiar looking
    error message:

    lkdtm: Performing direct entry EXEC_RODATA
    lkdtm: attempting ok execution at ffff0000084c0840
    lkdtm: attempting bad execution at ffff000008880680
    Unable to handle kernel paging request at virtual address ffff000008880680
    pgd = ffff8000089b2000
    [ffff000008880680] *pgd=00000000489b4003, *pud=0000000048904003, *pmd=0000000000000000
    Internal error: Oops: 8400000e [#1] PREEMPT SMP
    Modules linked in:
    CPU: 1 PID: 997 Comm: sh Not tainted 4.7.0-rc1+ #24
    Hardware name: linux,dummy-virt (DT)
    task: ffff800077f9f080 ti: ffff800008a1c000 task.ti: ffff800008a1c000
    PC is at lkdtm_rodata_do_nothing+0x0/0x8
    LR is at execute_location+0x74/0x88

    Acked-by: Mark Rutland
    Signed-off-by: Laura Abbott
    Signed-off-by: Catalin Marinas

    Laura Abbott
     

12 Aug, 2016

5 commits

  • KVM: s390: Fixes for 4.8 (via kvm/master)

    Here are two fixes found by fuzzing of the ioctl interface.
    Both cases can trigger a WARN_ON_ONCE from user space.

    Radim Krčmář
     
  • Propagate errors from kvm_mips_handle_kseg0_tlb_fault() and
    kvm_mips_handle_mapped_seg_tlb_fault(), usually triggering an internal
    error since they normally indicate the guest accessed bad physical
    memory or the commpage in an unexpected way.

    Fixes: 858dd5d45733 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
    Fixes: e685c689f3a8 ("KVM/MIPS32: Privileged instruction/target branch emulation.")
    Signed-off-by: James Hogan
    Cc: Paolo Bonzini
    Cc: "Radim Krčmář"
    Cc: Ralf Baechle
    Cc: linux-mips@linux-mips.org
    Cc: kvm@vger.kernel.org
    Cc: # 3.10.x-
    Signed-off-by: Radim Krčmář

    James Hogan
     
  • Two consecutive gfns are loaded into host TLB, so ensure the range check
    isn't off by one if guest_pmap_npages is odd.

    Fixes: 858dd5d45733 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
    Signed-off-by: James Hogan
    Cc: Paolo Bonzini
    Cc: "Radim Krčmář"
    Cc: Ralf Baechle
    Cc: linux-mips@linux-mips.org
    Cc: kvm@vger.kernel.org
    Cc: # 3.10.x-
    Signed-off-by: Radim Krčmář

    James Hogan
     
  • kvm_mips_handle_mapped_seg_tlb_fault() calculates the guest frame number
    based on the guest TLB EntryLo values, however it is not range checked
    to ensure it lies within the guest_pmap. If the physical memory the
    guest refers to is out of range then dump the guest TLB and emit an
    internal error.

    Fixes: 858dd5d45733 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
    Signed-off-by: James Hogan
    Cc: Paolo Bonzini
    Cc: "Radim Krčmář"
    Cc: Ralf Baechle
    Cc: linux-mips@linux-mips.org
    Cc: kvm@vger.kernel.org
    Cc: # 3.10.x-
    Signed-off-by: Radim Krčmář

    James Hogan
     
  • kvm_mips_handle_mapped_seg_tlb_fault() appears to map the guest page at
    virtual address 0 to PFN 0 if the guest has created its own mapping
    there. The intention is unclear, but it may have been an attempt to
    protect the zero page from being mapped to anything but the comm page in
    code paths you wouldn't expect from genuine commpage accesses (guest
    kernel mode cache instructions on that address, hitting trapping
    instructions when executing from that address with a coincidental TLB
    eviction during the KVM handling, and guest user mode accesses to that
    address).

    Fix this to check for mappings exactly at KVM_GUEST_COMMPAGE_ADDR (it
    may not be at address 0 since commit 42aa12e74e91 ("MIPS: KVM: Move
    commpage so 0x0 is unmapped")), and set the corresponding EntryLo to be
    interpreted as 0 (invalid).

    Fixes: 858dd5d45733 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
    Signed-off-by: James Hogan
    Cc: Paolo Bonzini
    Cc: "Radim Krčmář"
    Cc: Ralf Baechle
    Cc: linux-mips@linux-mips.org
    Cc: kvm@vger.kernel.org
    Cc: # 3.10.x-
    Signed-off-by: Radim Krčmář

    James Hogan