27 Oct, 2014

3 commits

  • Linus Torvalds
     
  • Pull ARM SoC fixes from Olof Johansson:
    "Another week, another small batch of fixes.

    Most of these make zynq, socfpga and sunxi platforms work a bit
    better:

    - due to new requirements for regulators, DWMMC on socfpga broke past
    v3.17
    - SMP spinup fix for socfpga
    - a few DT fixes for zynq
    - another option (FIXED_REGULATOR) for sunxi is needed that used to
    be selected by other options but no longer is.
    - a couple of small DT fixes for at91
    - ...and a couple for i.MX"

    * tag 'armsoc-for-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
    ARM: dts: imx28-evk: Let i2c0 run at 100kHz
    ARM: i.MX6: Fix "emi" clock name typo
    ARM: multi_v7_defconfig: enable CONFIG_MMC_DW_ROCKCHIP
    ARM: sunxi_defconfig: enable CONFIG_REGULATOR_FIXED_VOLTAGE
    ARM: dts: socfpga: Add a 3.3V fixed regulator node
    ARM: dts: socfpga: Fix SD card detect
    ARM: dts: socfpga: rename gpio nodes
    ARM: at91/dt: sam9263: fix PLLB frequencies
    power: reset: at91-reset: fix power down register
    MAINTAINERS: add atmel ssc driver maintainer entry
    arm: socfpga: fix fetching cpu1start_addr for SMP
    ARM: zynq: DT: trivial: Fix mc node
    ARM: zynq: DT: Add cadence watchdog node
    ARM: zynq: DT: Add missing reference for memory-controller
    ARM: zynq: DT: Add missing reference for ADC
    ARM: zynq: DT: Add missing address for L2 pl310
    ARM: zynq: DT: Remove 222 MHz OPP
    ARM: zynq: DT: Fix GEM register area size

    Linus Torvalds
     
  • Pull vfs updates from Al Viro:
    "overlayfs merge + leak fix for d_splice_alias() failure exits"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    overlayfs: embed middle into overlay_readdir_data
    overlayfs: embed root into overlay_readdir_data
    overlayfs: make ovl_cache_entry->name an array instead of pointer
    overlayfs: don't hold ->i_mutex over opening the real directory
    fix inode leaks on d_splice_alias() failure exits
    fs: limit filesystem stacking depth
    overlay: overlay filesystem documentation
    overlayfs: implement show_options
    overlayfs: add statfs support
    overlay filesystem
    shmem: support RENAME_WHITEOUT
    ext4: support RENAME_WHITEOUT
    vfs: add RENAME_WHITEOUT
    vfs: add whiteout support
    vfs: export check_sticky()
    vfs: introduce clone_private_mount()
    vfs: export __inode_permission() to modules
    vfs: export do_splice_direct() to modules
    vfs: add i_op->dentry_open()

    Linus Torvalds
     

26 Oct, 2014

1 commit


25 Oct, 2014

18 commits

  • Commit 78b81f4666fb ("ARM: dts: imx28-evk: Run I2C0 at 400kHz") caused issues
    when doing the following sequence in loop:

    - Boot the kernel
    - Perform audio playback
    - Reboot the system via 'reboot' command

    In many times the audio card cannot be probed, which causes playback to fail.

    After restoring to the original i2c0 frequency of 100kHz there is no such
    problem anymore.

    This reverts commit 78b81f4666fbb22a20b1e63e5baf197ad2e90e88.

    Cc: # 3.16+
    Signed-off-by: Fabio Estevam
    Signed-off-by: Shawn Guo

    Fabio Estevam
     
  • Fix a typo error, the "emi" names refer to the eim clocks.

    The change fixes typo in EIM and EIM_SLOW pre-output dividers and
    selectors clock names. Notably EIM_SLOW clock itself is named correctly.

    Signed-off-by: Steve Longerbeam
    [vladimir_zapolskiy@mentor.com: ported to v3.17]
    Signed-off-by: Vladimir Zapolskiy
    Cc: Sascha Hauer
    Signed-off-by: Shawn Guo

    Steve Longerbeam
     
  • same story...

    Signed-off-by: Al Viro

    Al Viro
     
  • no sense having it a pointer - all instances have it pointing to
    local variable in the same stack frame

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • just use it to serialize the assignment

    Signed-off-by: Al Viro

    Al Viro
     
  • Pull MIPS fixes from Ralf Baechle:
    "This is the first round of fixes and tying up loose ends for MIPS.

    - plenty of fixes for build errors in specific obscure configurations
    - remove redundant code on the Lantiq platform
    - removal of a useless SEAD I2C driver that was causing a build issue
    - fix an earlier TLB exeption handler fix to also work on Octeon.
    - fix ISA level dependencies in FPU emulator's instruction decoding.
    - don't hardcode kernel command line in Octeon software emulator.
    - fix an earlier fix for the Loondson 2 clock setting"

    * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
    MIPS: SEAD3: Fix I2C device registration.
    MIPS: SEAD3: Nuke PIC32 I2C driver.
    MIPS: ftrace: Fix a microMIPS build problem
    MIPS: MSP71xx: Fix build error
    MIPS: Malta: Do not build the malta-amon.c file if CMP is not enabled
    MIPS: Prevent compiler warning from cop2_{save,restore}
    MIPS: Kconfig: Add missing MIPS_CPS dependencies to PM and cpuidle
    MIPS: idle: Remove leftover __pastwait symbol and its references
    MIPS: Sibyte: Include the swarm subdir to the sb1250 LittleSur builds
    MIPS: ptrace.h: Add a missing include
    MIPS: ath79: Fix compilation error when CONFIG_PCI is disabled
    MIPS: MSP71xx: Remove compilation error when CONFIG_MIPS_MT is present
    MIPS: Octeon: Remove special case for simulator command line.
    MIPS: tlbex: Properly fix HUGE TLB Refill exception handler
    MIPS: loongson2_cpufreq: Fix CPU clock rate setting mismerge
    pci: pci-lantiq: remove duplicate check on resource
    MIPS: Lasat: Add missing CONFIG_PROC_FS dependency to PICVUE_PROC
    MIPS: cp1emu: Fix ISA restrictions for cop1x_op instructions

    Linus Torvalds
     
  • Pull arm64 fixes from Catalin Marinas:

    - enable 48-bit VA space now that KVM has been fixed, together with a
    couple of fixes for pgd allocation alignment and initial memblock
    current_limit. There is still a dependency on !ARM_SMMU which needs
    to be updated as it uses the page table manipulation macros of the
    host kernel
    - eBPF fixes following changes/conflicts during the merging window
    - Compat types affecting compat_elf_prpsinfo
    - Compilation error on UP builds
    - ASLR fix when /proc/sys/kernel/randomize_va_space == 0
    - DT definitions for CLCD support on ARMv8 model platform

    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
    arm64: Fix memblock current_limit with 64K pages and 48-bit VA
    arm64: ASLR: Don't randomise text when randomise_va_space == 0
    arm64: vexpress: Add CLCD support to the ARMv8 model platform
    arm64: Fix compilation error on UP builds
    Documentation/arm64/memory.txt: fix typo
    net: bpf: arm64: minor fix of type in jited
    arm64: bpf: add 'load 64-bit immediate' instruction
    arm64: bpf: add 'shift by register' instructions
    net: bpf: arm64: address randomize and write protect JIT code
    arm64: mm: Correct fixmap pagetable types
    arm64: compat: fix compat types affecting struct compat_elf_prpsinfo
    arm64: Align less than PAGE_SIZE pgds naturally
    arm64: Allow 48-bits VA space without ARM_SMMU

    Linus Torvalds
     
  • Pull two sparc fixes from David Miller:

    1) Fix boots with gcc-4.9 compiled sparc64 kernels.

    2) Add missing __get_user_pages_fast() on sparc64 to fix hangs on
    futexes used in transparent hugepage areas.

    It's really idiotic to have a weak symbolled fallback that just
    returns zero, and causes this kind of bug. There should be no
    backup implementation and the link should fail if the architecture
    fails to provide __get_user_pages_fast() and supports transparent
    hugepages.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
    sparc64: Implement __get_user_pages_fast().
    sparc64: Fix register corruption in top-most kernel stack frame during boot.

    Linus Torvalds
     
  • Pull kvm fixes from Paolo Bonzini:
    "This is a pretty large update. I think it is roughly as big as what I
    usually had for the _whole_ rc period.

    There are a few bad bugs where the guest can OOPS or crash the host.
    We have also started looking at attack models for nested
    virtualization; bugs that usually result in the guest ring 0 crashing
    itself become more worrisome if you have nested virtualization,
    because the nested guest might bring down the non-nested guest as
    well. For current uses of nested virtualization these do not really
    have a security impact, but you never know and bugs are bugs
    nevertheless.

    A lot of these bugs are in 3.17 too, resulting in a large number of
    stable@ Ccs. I checked that all the patches apply there with no
    conflicts"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    kvm: vfio: fix unregister kvm_device_ops of vfio
    KVM: x86: Wrong assertion on paging_tmpl.h
    kvm: fix excessive pages un-pinning in kvm_iommu_map error path.
    KVM: x86: PREFETCH and HINT_NOP should have SrcMem flag
    KVM: x86: Emulator does not decode clflush well
    KVM: emulate: avoid accessing NULL ctxt->memopp
    KVM: x86: Decoding guest instructions which cross page boundary may fail
    kvm: x86: don't kill guest on unknown exit reason
    kvm: vmx: handle invvpid vm exit gracefully
    KVM: x86: Handle errors when RIP is set during far jumps
    KVM: x86: Emulator fixes for eip canonical checks on near branches
    KVM: x86: Fix wrong masking on relative jump/call
    KVM: x86: Improve thread safety in pit
    KVM: x86: Prevent host from panicking on shared MSR writes.
    KVM: x86: Check non-canonical addresses upon WRMSR

    Linus Torvalds
     
  • Pull xen bug fixes from David Vrabel:

    - Fix regression in xen_clocksource_read() which caused all Xen guests
    to crash early in boot.
    - Several fixes for super rare race conditions in the p2m.
    - Assorted other minor fixes.

    * tag 'stable/for-linus-3.18-b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    xen/pci: Allocate memory for physdev_pci_device_add's optarr
    x86/xen: panic on bad Xen-provided memory map
    x86/xen: Fix incorrect per_cpu accessor in xen_clocksource_read()
    x86/xen: avoid race in p2m handling
    x86/xen: delay construction of mfn_list_list
    x86/xen: avoid writing to freed memory after race in p2m handling
    xen/balloon: Don't continue ballooning when BP_ECANCELED is encountered

    Linus Torvalds
     
  • Pull sound fixes from Takashi Iwai:
    "Here are a chunk of small fixes since rc1: two PCM core fixes, one is
    a long-standing annoyance about lockdep and another is an ARM64 mmap
    fix.

    The rest are a HD-audio HDMI hotplug notification fix, a fix for
    missing NULL termination in Realtek codec quirks and a few new
    device/codec-specific quirks as usual"

    * tag 'sound-3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
    ALSA: hda - Add missing terminating entry to SND_HDA_PIN_QUIRK macro
    ALSA: pcm: Fix false lockdep warnings
    ALSA: hda - Fix inverted LED gpio setup for Lenovo Ideapad
    ALSA: hda - hdmi: Fix missing ELD change event on plug/unplug
    ALSA: usb-audio: Add support for Steinberg UR22 USB interface
    ALSA: ALC283 codec - Avoid pop noise on headphones during suspend/resume
    ALSA: pcm: use the same dma mmap codepath both for arm and arm64

    Linus Torvalds
     
  • Pull /dev/random updates from Ted Ts'o:
    "This adds a memzero_explicit() call which is guaranteed not to be
    optimized away by GCC. This is important when we are wiping
    cryptographically sensitive material"

    * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
    crypto: memzero_explicit - make sure to clear out sensitive data
    random: add and use memzero_explicit() for clearing data

    Linus Torvalds
     
  • Pull ACPI and power management updates from Rafael Wysocki:
    "This is material that didn't make it to my 3.18-rc1 pull request for
    various reasons, mostly related to timing and travel (LinuxCon EU /
    LPC) plus a couple of fixes for recent bugs.

    The only really new thing here is the PM QoS class for memory
    bandwidth, but it is simple enough and users of it will be added in
    the next cycle. One major change in behavior is that platform devices
    enumerated by ACPI will use 32-bit DMA mask by default. Also included
    is an ACPICA update to a new upstream release, but that's mostly
    cleanups, changes in tools and similar. The rest is fixes and
    cleanups mostly.

    Specifics:

    - Fix for a recent PCI power management change that overlooked the
    fact that some IRQ chips might not be able to configure PCIe PME
    for system wakeup from Lucas Stach.

    - Fix for a bug introduced in 3.17 where acpi_device_wakeup() is
    called with a wrong ordering of arguments from Zhang Rui.

    - A bunch of intel_pstate driver fixes (all -stable candidates) from
    Dirk Brandewie, Gabriele Mazzotta and Pali Rohár.

    - Fixes for a rather long-standing problem with the OOM killer and
    the freezer that frozen processes killed by the OOM do not actually
    release any memory until they are thawed, so OOM-killing them is
    rather pointless, with a couple of cleanups on top (Michal Hocko,
    Cong Wang, Rafael J Wysocki).

    - ACPICA update to upstream release 20140926, inlcuding mostly
    cleanups reducing differences between the upstream ACPICA and the
    kernel code, tools changes (acpidump, acpiexec) and support for the
    _DDN object (Bob Moore, Lv Zheng).

    - New PM QoS class for memory bandwidth from Tomeu Vizoso.

    - Default 32-bit DMA mask for platform devices enumerated by ACPI
    (this change is mostly needed for some drivers development in
    progress targeted at 3.19) from Heikki Krogerus.

    - ACPI EC driver cleanups, mostly related to debugging, from Lv
    Zheng.

    - cpufreq-dt driver updates from Thomas Petazzoni.

    - powernv cpuidle driver update from Preeti U Murthy"

    * tag 'pm+acpi-3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (34 commits)
    intel_pstate: Correct BYT VID values.
    intel_pstate: Fix BYT frequency reporting
    intel_pstate: Don't lose sysfs settings during cpu offline
    cpufreq: intel_pstate: Reflect current no_turbo state correctly
    cpufreq: expose scaling_cur_freq sysfs file for set_policy() drivers
    cpufreq: intel_pstate: Fix setting max_perf_pct in performance policy
    PCI / PM: handle failure to enable wakeup on PCIe PME
    ACPI: invoke acpi_device_wakeup() with correct parameters
    PM / freezer: Clean up code after recent fixes
    PM: convert do_each_thread to for_each_process_thread
    OOM, PM: OOM killed task shouldn't escape PM suspend
    freezer: remove obsolete comments in __thaw_task()
    freezer: Do not freeze tasks killed by OOM killer
    ACPI / platform: provide default DMA mask
    cpuidle: powernv: Populate cpuidle state details by querying the device-tree
    cpufreq: cpufreq-dt: adjust message related to regulators
    cpufreq: cpufreq-dt: extend with platform_data
    cpufreq: allow driver-specific data
    ACPI / EC: Cleanup coding style.
    ACPI / EC: Refine event/query debugging messages.
    ...

    Linus Torvalds
     
  • Pull thermal management updates from Zhang Rui:
    "Sorry that I missed the merge window as there is a bug found in the
    last minute, and I have to fix it and wait for the code to be tested
    in linux-next tree for a few days. Now the buggy patch has been
    dropped entirely from my next branch. Thus I hope those changes can
    still be merged in 3.18-rc2 as most of them are platform thermal
    driver changes.

    Specifics:

    - introduce ACPI INT340X thermal drivers.

    Newer laptops and tablets may have thermal sensors and other
    devices with thermal control capabilities that are exposed for the
    OS to use via the ACPI INT340x device objects. Several drivers are
    introduced to expose the temperature information and cooling
    ability from these objects to user-space via the normal thermal
    framework.

    From: Lu Aaron, Lan Tianyu, Jacob Pan and Zhang Rui.

    - introduce a new thermal governor, which just uses a hysteresis to
    switch abruptly on/off a cooling device. This governor can be used
    to control certain fan devices that can not be throttled but just
    switched on or off. From: Peter Feuerer.

    - introduce support for some new thermal interrupt functions on
    i.MX6SX, in IMX thermal driver. From: Anson, Huang.

    - introduce tracing support on thermal framework. From: Punit
    Agrawal.

    - small fixes in OF thermal and thermal step_wise governor"

    * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (25 commits)
    Thermal: int340x thermal: select ACPI fan driver
    Thermal: int3400_thermal: use acpi_thermal_rel parsing APIs
    Thermal: int340x_thermal: expose acpi thermal relationship tables
    Thermal: introduce int3403 thermal driver
    Thermal: introduce INT3402 thermal driver
    Thermal: move the KELVIN_TO_MILLICELSIUS macro to thermal.h
    ACPI / Fan: support INT3404 thermal device
    ACPI / Fan: add ACPI 4.0 style fan support
    ACPI / fan: convert to platform driver
    ACPI / fan: use acpi_device_xxx_power instead of acpi_bus equivelant
    ACPI / fan: remove no need check for device pointer
    ACPI / fan: remove unused macro
    Thermal: int3400 thermal: register to thermal framework
    Thermal: int3400 thermal: add capability to detect supporting UUIDs
    Thermal: introduce int3400 thermal driver
    ACPI: add ACPI_TYPE_LOCAL_REFERENCE support to acpi_extract_package()
    ACPI: make acpi_create_platform_device() an external API
    thermal: step_wise: fix: Prevent from binary overflow when trend is dropping
    ACPI: introduce ACPI int340x thermal scan handler
    thermal: Added Bang-bang thermal governor
    ...

    Linus Torvalds
     
  • With 48-bit VA space, the 64K page configuration uses 3 levels instead
    of 2 and PUD_SIZE != PMD_SIZE. Since with 64K pages we only cover
    PMD_SIZE with the initial swapper_pg_dir populated in head.S, the
    memblock current_limit needs to be set accordingly in map_mem() to avoid
    allocating unmapped memory. The memblock current_limit is progressively
    increased as more blocks are mapped.

    Signed-off-by: Catalin Marinas

    Catalin Marinas
     
  • It is not sufficient to only implement get_user_pages_fast(), you
    must also implement the atomic version __get_user_pages_fast()
    otherwise you end up using the weak symbol fallback implementation
    which simply returns zero.

    This is dangerous, because it causes the futex code to loop forever
    if transparent hugepages are supported (see get_futex_key()).

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Meelis Roos reported that kernels built with gcc-4.9 do not boot, we
    eventually narrowed this down to only impacting machines using
    UltraSPARC-III and derivitive cpus.

    The crash happens right when the first user process is spawned:

    [ 54.451346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
    [ 54.451346]
    [ 54.571516] CPU: 1 PID: 1 Comm: init Not tainted 3.16.0-rc2-00211-gd7933ab #96
    [ 54.666431] Call Trace:
    [ 54.698453] [0000000000762f8c] panic+0xb0/0x224
    [ 54.759071] [000000000045cf68] do_exit+0x948/0x960
    [ 54.823123] [000000000042cbc0] fault_in_user_windows+0xe0/0x100
    [ 54.902036] [0000000000404ad0] __handle_user_windows+0x0/0x10
    [ 54.978662] Press Stop-A (L1-A) to return to the boot prom
    [ 55.050713] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004

    Further investigation showed that compiling only per_cpu_patch() with
    an older compiler fixes the boot.

    Detailed analysis showed that the function is not being miscompiled by
    gcc-4.9, but it is using a different register allocation ordering.

    With the gcc-4.9 compiled function, something during the code patching
    causes some of the %i* input registers to get corrupted. Perhaps
    we have a TLB miss path into the firmware that is deep enough to
    cause a register window spill and subsequent restore when we get
    back from the TLB miss trap.

    Let's plug this up by doing two things:

    1) Stop using the firmware stack for client interface calls into
    the firmware. Just use the kernel's stack.

    2) As soon as we can, call into a new function "start_early_boot()"
    to put a one-register-window buffer between the firmware's
    deepest stack frame and the top-most initial kernel one.

    Reported-by: Meelis Roos
    Tested-by: Meelis Roos
    Signed-off-by: David S. Miller

    David S. Miller
     

24 Oct, 2014

18 commits

  • When user asks to turn off ASLR by writing "0" to
    /proc/sys/kernel/randomize_va_space there should not be
    any randomization to mmap base, stack, VDSO, libs, text and heap

    Currently arm64 violates this behavior by randomising text.
    Fix this by defining a constant ELF_ET_DYN_BASE. The randomisation of
    mm->mmap_base is done by setup_new_exec -> arch_pick_mmap_layout ->
    mmap_base -> mmap_rnd.

    Signed-off-by: Arun Chandran
    Signed-off-by: Catalin Marinas

    Arun Chandran
     
  • This isn't a module and shouldn't be one.

    Signed-off-by: Ralf Baechle

    Ralf Baechle
     
  • After commit 80ce163 (KVM: VFIO: register kvm_device_ops dynamically),
    kvm_device_ops of vfio can be registered dynamically. Commit 3c3c29fd
    (kvm-vfio: do not use module_init) move the dynamic register invoked by
    kvm_init in order to fix broke unloading of the kvm module. However,
    kvm_device_ops of vfio is unregistered after rmmod kvm-intel module
    which lead to device type collision detection warning after kvm-intel
    module reinsmod.

    WARNING: CPU: 1 PID: 10358 at /root/cathy/kvm/arch/x86/kvm/../../../virt/kvm/kvm_main.c:3289 kvm_init+0x234/0x282 [kvm]()
    Modules linked in: kvm_intel(O+) kvm(O) nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs fscache lockd sunrpc pci_stub bridge stp llc autofs4 8021q cpufreq_ondemand ipv6 joydev microcode pcspkr igb i2c_algo_bit ehci_pci ehci_hcd e1000e i2c_i801 ixgbe ptp pps_core hwmon mdio tpm_tis tpm ipmi_si ipmi_msghandler acpi_cpufreq isci libsas scsi_transport_sas button dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kvm_intel]
    CPU: 1 PID: 10358 Comm: insmod Tainted: G W O 3.17.0-rc1 #2
    Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
    0000000000000cd9 ffff880ff08cfd18 ffffffff814a61d9 0000000000000cd9
    0000000000000000 ffff880ff08cfd58 ffffffff810417b7 ffff880ff08cfd48
    ffffffffa045bcac ffffffffa049c420 0000000000000040 00000000000000ff
    Call Trace:
    [] dump_stack+0x49/0x60
    [] warn_slowpath_common+0x7c/0x96
    [] ? kvm_init+0x234/0x282 [kvm]
    [] warn_slowpath_null+0x15/0x17
    [] kvm_init+0x234/0x282 [kvm]
    [] vmx_init+0x1bf/0x42a [kvm_intel]
    [] ? vmx_check_processor_compat+0x64/0x64 [kvm_intel]
    [] do_one_initcall+0xe3/0x170
    [] ? __vunmap+0xad/0xb8
    [] do_init_module+0x2b/0x174
    [] load_module+0x43e/0x569
    [] ? do_init_module+0x174/0x174
    [] ? copy_module_from_user+0x39/0x82
    [] ? module_sect_show+0x20/0x20
    [] SyS_init_module+0x54/0x81
    [] system_call_fastpath+0x16/0x1b
    ---[ end trace 0626f4a3ddea56f3 ]---

    The bug can be reproduced by:

    rmmod kvm_intel.ko
    insmod kvm_intel.ko

    without rmmod/insmod kvm.ko
    This patch fixes the bug by unregistering kvm_device_ops of vfio when the
    kvm-intel module is removed.

    Reported-by: Liu Rongrong
    Fixes: 3c3c29fd0d7cddc32862c350d0700ce69953e3bd
    Signed-off-by: Wanpeng Li
    Signed-off-by: Paolo Bonzini

    Wanpeng Li
     
  • Even after the recent fix, the assertion on paging_tmpl.h is triggered.
    Apparently, the assertion wants to check that the PAE is always set on
    long-mode, but does it in incorrect way. Note that the assertion is not
    enabled unless the code is debugged by defining MMU_DEBUG.

    Signed-off-by: Nadav Amit
    Signed-off-by: Paolo Bonzini

    Nadav Amit
     
  • The third parameter of kvm_unpin_pages() when called from
    kvm_iommu_map_pages() is wrong, it should be the number of pages to un-pin
    and not the page size.

    This error was facilitated with an inconsistent API: kvm_pin_pages() takes
    a size, but kvn_unpin_pages() takes a number of pages, so fix the problem
    by matching the two.

    This was introduced by commit 350b8bd ("kvm: iommu: fix the third parameter
    of kvm_iommu_put_pages (CVE-2014-3601)"), which fixes the lack of
    un-pinning for pages intended to be un-pinned (i.e. memory leak) but
    unfortunately potentially aggravated the number of pages we un-pin that
    should have stayed pinned. As far as I understand though, the same
    practical mitigations apply.

    This issue was found during review of Red Hat 6.6 patches to prepare
    Ksplice rebootless updates.

    Thanks to Vegard for his time on a late Friday evening to help me in
    understanding this code.

    Fixes: 350b8bd ("kvm: iommu: fix the third parameter of... (CVE-2014-3601)")
    Cc: stable@vger.kernel.org
    Signed-off-by: Quentin Casasnovas
    Signed-off-by: Vegard Nossum
    Signed-off-by: Jamie Iles
    Reviewed-by: Sasha Levin
    Signed-off-by: Paolo Bonzini

    Quentin Casasnovas
     
  • The decode phase of the x86 emulator assumes that every instruction with the
    ModRM flag, and which can be used with RIP-relative addressing, has either
    SrcMem or DstMem. This is not the case for several instructions - prefetch,
    hint-nop and clflush.

    Adding SrcMem|NoAccess for prefetch and hint-nop and SrcMem for clflush.

    This fixes CVE-2014-8480.

    Fixes: 41061cdb98a0bec464278b4db8e894a3121671f5
    Cc: stable@vger.kernel.org
    Signed-off-by: Nadav Amit
    Signed-off-by: Paolo Bonzini

    Nadav Amit
     
  • Currently, all group15 instructions are decoded as clflush (e.g., mfence,
    xsave). In addition, the clflush instruction requires no prefix (66/f2/f3)
    would exist. If prefix exists it may encode a different instruction (e.g.,
    clflushopt).

    Creating a group for clflush, and different group for each prefix.

    This has been the case forever, but the next patch needs the cflush group
    in order to fix a bug introduced in 3.17.

    Fixes: 41061cdb98a0bec464278b4db8e894a3121671f5
    Cc: stable@vger.kernel.org
    Signed-off-by: Nadav Amit
    Signed-off-by: Paolo Bonzini

    Nadav Amit
     
  • A failure to decode the instruction can cause a NULL pointer access.
    This is fixed simply by moving the "done" label as close as possible
    to the return.

    This fixes CVE-2014-8481.

    Reported-by: Andy Lutomirski
    Cc: stable@vger.kernel.org
    Fixes: 41061cdb98a0bec464278b4db8e894a3121671f5
    Signed-off-by: Paolo Bonzini

    Paolo Bonzini
     
  • A platform driver for which nothing ever registers the corresponding
    platform device.

    Also it was driving the same hardware as sead3-i2c-drv.c so redundant
    anyway and couldn't co-exist with that driver because each of them was
    using a private spinlock to protect access to the same hardware
    resources.

    This also fixes a randconfig problem:

    arch/mips/mti-sead3/sead3-pic32-i2c-drv.c: In function 'i2c_platform_probe':
    arch/mips/mti-sead3/sead3-pic32-i2c-drv.c:345:2: error: implicit declaration of
    function 'i2c_add_numbered_adapter' [-Werror=implicit-function-declaration]
    ret = i2c_add_numbered_adapter(&priv->adap);
    ^
    arch/mips/mti-sead3/sead3-pic32-i2c-drv.c: In function
    'i2c_platform_remove':
    arch/mips/mti-sead3/sead3-pic32-i2c-drv.c:361:2: error: implicit declaration
    of function 'i2c_del_adapter' [-Werror=implicit-function-declaration]
    i2c_del_adapter(&priv->adap);

    Signed-off-by: Ralf Baechle

    Ralf Baechle
     
  • Once an instruction crosses a page boundary, the size read from the second page
    disregards the common case that part of the operand resides on the first page.
    As a result, fetch of long insturctions may fail, and thereby cause the
    decoding to fail as well.

    Cc: stable@vger.kernel.org
    Fixes: 5cfc7e0f5e5e1adf998df94f8e36edaf5d30d38e
    Signed-off-by: Nadav Amit
    Signed-off-by: Paolo Bonzini

    Nadav Amit
     
  • KVM_EXIT_UNKNOWN is a kvm bug, we don't really know whether it was
    triggered by a priveledged application. Let's not kill the guest: WARN
    and inject #UD instead.

    Cc: stable@vger.kernel.org
    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Paolo Bonzini

    Michael S. Tsirkin
     
  • On systems with invvpid instruction support (corresponding bit in
    IA32_VMX_EPT_VPID_CAP MSR is set) guest invocation of invvpid
    causes vm exit, which is currently not handled and results in
    propagation of unknown exit to userspace.

    Fix this by installing an invvpid vm exit handler.

    This is CVE-2014-3646.

    Cc: stable@vger.kernel.org
    Signed-off-by: Petr Matousek
    Signed-off-by: Paolo Bonzini

    Petr Matousek
     
  • Far jmp/call/ret may fault while loading a new RIP. Currently KVM does not
    handle this case, and may result in failed vm-entry once the assignment is
    done. The tricky part of doing so is that loading the new CS affects the
    VMCS/VMCB state, so if we fail during loading the new RIP, we are left in
    unconsistent state. Therefore, this patch saves on 64-bit the old CS
    descriptor and restores it if loading RIP failed.

    This fixes CVE-2014-3647.

    Cc: stable@vger.kernel.org
    Signed-off-by: Nadav Amit
    Signed-off-by: Paolo Bonzini

    Nadav Amit
     
  • Before changing rip (during jmp, call, ret, etc.) the target should be asserted
    to be canonical one, as real CPUs do. During sysret, both target rsp and rip
    should be canonical. If any of these values is noncanonical, a #GP exception
    should occur. The exception to this rule are syscall and sysenter instructions
    in which the assigned rip is checked during the assignment to the relevant
    MSRs.

    This patch fixes the emulator to behave as real CPUs do for near branches.
    Far branches are handled by the next patch.

    This fixes CVE-2014-3647.

    Cc: stable@vger.kernel.org
    Signed-off-by: Nadav Amit
    Signed-off-by: Paolo Bonzini

    Nadav Amit
     
  • Relative jumps and calls do the masking according to the operand size, and not
    according to the address size as the KVM emulator does today.

    This patch fixes KVM behavior.

    Cc: stable@vger.kernel.org
    Signed-off-by: Nadav Amit
    Signed-off-by: Paolo Bonzini

    Nadav Amit
     
  • There's a race condition in the PIT emulation code in KVM. In
    __kvm_migrate_pit_timer the pit_timer object is accessed without
    synchronization. If the race condition occurs at the wrong time this
    can crash the host kernel.

    This fixes CVE-2014-3611.

    Cc: stable@vger.kernel.org
    Signed-off-by: Andrew Honig
    Signed-off-by: Paolo Bonzini

    Andy Honig
     
  • The previous patch blocked invalid writes directly when the MSR
    is written. As a precaution, prevent future similar mistakes by
    gracefulling handle GPs caused by writes to shared MSRs.

    Cc: stable@vger.kernel.org
    Signed-off-by: Andrew Honig
    [Remove parts obsoleted by Nadav's patch. - Paolo]
    Signed-off-by: Paolo Bonzini

    Andy Honig
     
  • Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is
    written to certain MSRs. The behavior is "almost" identical for AMD and Intel
    (ignoring MSRs that are not implemented in either architecture since they would
    anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
    non-canonical address is written on Intel but not on AMD (which ignores the top
    32-bits).

    Accordingly, this patch injects a #GP on the MSRs which behave identically on
    Intel and AMD. To eliminate the differences between the architecutres, the
    value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to
    canonical value before writing instead of injecting a #GP.

    Some references from Intel and AMD manuals:

    According to Intel SDM description of WRMSR instruction #GP is expected on
    WRMSR "If the source register contains a non-canonical address and ECX
    specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE,
    IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP."

    According to AMD manual instruction manual:
    LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the
    LSTAR and CSTAR registers. If an RIP written by WRMSR is not in canonical
    form, a general-protection exception (#GP) occurs."
    IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the
    base field must be in canonical form or a #GP fault will occur."
    IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must
    be in canonical form."

    This patch fixes CVE-2014-3610.

    Cc: stable@vger.kernel.org
    Signed-off-by: Nadav Amit
    Signed-off-by: Paolo Bonzini

    Nadav Amit