28 Apr, 2014
16 commits
-
We have two copies of code that creates an OPAL sg list. Consolidate
these into a common set of helpers and fix the endian issues.The flash interface embedded a version number in the num_entries
field, whereas the dump interface did did not. Since versioning
wasn't added to the flash interface and it is impossible to add
this in a backwards compatible way, just remove it.Signed-off-by: Anton Blanchard
Signed-off-by: Benjamin Herrenschmidt -
Fix little endian issues with the OPAL error log code.
Signed-off-by: Anton Blanchard
Reviewed-by: Stewart Smith
Signed-off-by: Benjamin Herrenschmidt -
The bitmap in opal_poll_events and opal_handle_interrupt is
big endian, so we need to byteswap it on little endian builds.Signed-off-by: Anton Blanchard
Signed-off-by: Benjamin Herrenschmidt -
We had some duplication of the internal OPAL functions.
Signed-off-by: Anton Blanchard
Signed-off-by: Benjamin Herrenschmidt -
Using size_t in our APIs is asking for trouble, especially
when some OPAL calls use size_t pointers.Signed-off-by: Anton Blanchard
Reviewed-by: Stewart Smith
Signed-off-by: Benjamin Herrenschmidt -
On PowerNV platform, we are holding an unnecessary refcount on a pci_dev, which
leads to the pci_dev is not destroyed when hotplugging a pci device.This patch release the unnecessary refcount.
Signed-off-by: Wei Yang
Signed-off-by: Benjamin Herrenschmidt -
During the EEH hotplug event, iommu_add_device() will be invoked three times
and two of them will trigger warning or error.The three times to invoke the iommu_add_device() are:
pci_device_add
...
set_iommu_table_base_and_group kobj->sd is not initialized. The
dev->kobj->sd is initialized in device_add().
The third time's warning is triggered by the re-attach of the iommu_group.After applying this patch, the error
iommu_tce: 0003:05:00.0 has not been added, ret=-14
and the warning
[ 204.123609] ------------[ cut here ]------------
[ 204.123645] WARNING: at arch/powerpc/kernel/iommu.c:1125
[ 204.123680] Modules linked in: xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT bnep bluetooth 6lowpan_iphc rfkill xt_conntrack ebtable_nat ebtable_broute bridge stp llc mlx4_ib ib_sa ib_mad ib_core ib_addr ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnx2x tg3 mlx4_core nfsd ptp mdio ses libcrc32c nfs_acl enclosure be2net pps_core shpchp lockd kvm uinput sunrpc binfmt_misc lpfc scsi_transport_fc ipr scsi_tgt
[ 204.124356] CPU: 18 PID: 650 Comm: eehd Not tainted 3.14.0-rc5yw+ #102
[ 204.124400] task: c0000027ed485670 ti: c0000027ed50c000 task.ti: c0000027ed50c000
[ 204.124453] NIP: c00000000003cf80 LR: c00000000006c648 CTR: c00000000006c5c0
[ 204.124506] REGS: c0000027ed50f440 TRAP: 0700 Not tainted (3.14.0-rc5yw+)
[ 204.124558] MSR: 9000000000029032 CR: 88008084 XER: 20000000
[ 204.124682] CFAR: c00000000006c644 SOFTE: 1
GPR00: c00000000006c648 c0000027ed50f6c0 c000000001398380 c0000027ec260300
GPR04: c0000027ea92c000 c00000000006ad00 c0000000016e41b0 0000000000000110
GPR08: c0000000012cd4c0 0000000000000001 c0000027ec2602ff 0000000000000062
GPR12: 0000000028008084 c00000000fdca200 c0000000000d1d90 c0000027ec281a80
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
GPR24: 000000005342697b 0000000000002906 c000001fe6ac9800 c000001fe6ac9800
GPR28: 0000000000000000 c0000000016e3a80 c0000027ea92c090 c0000027ea92c000
[ 204.125353] NIP [c00000000003cf80] .iommu_add_device+0x30/0x1f0
[ 204.125399] LR [c00000000006c648] .pnv_pci_ioda_dma_dev_setup+0x88/0xb0
[ 204.125443] Call Trace:
[ 204.125464] [c0000027ed50f6c0] [c0000027ed50f750] 0xc0000027ed50f750 (unreliable)
[ 204.125526] [c0000027ed50f750] [c00000000006c648] .pnv_pci_ioda_dma_dev_setup+0x88/0xb0
[ 204.125588] [c0000027ed50f7d0] [c000000000069cc8] .pnv_pci_dma_dev_setup+0x78/0x340
[ 204.125650] [c0000027ed50f870] [c000000000044408] .pcibios_setup_device+0x88/0x2f0
[ 204.125712] [c0000027ed50f940] [c000000000046040] .pcibios_setup_bus_devices+0x60/0xd0
[ 204.125774] [c0000027ed50f9c0] [c000000000043acc] .pcibios_add_pci_devices+0xdc/0x1c0
[ 204.125837] [c0000027ed50fa50] [c00000000086f970] .eeh_reset_device+0x36c/0x4f0
[ 204.125939] [c0000027ed50fb20] [c00000000003a2d8] .eeh_handle_normal_event+0x448/0x480
[ 204.126068] [c0000027ed50fbc0] [c00000000003a35c] .eeh_handle_event+0x4c/0x340
[ 204.126192] [c0000027ed50fc80] [c00000000003a74c] .eeh_event_handler+0xfc/0x1b0
[ 204.126319] [c0000027ed50fd30] [c0000000000d1ea0] .kthread+0x110/0x130
[ 204.126430] [c0000027ed50fe30] [c00000000000a460] .ret_from_kernel_thread+0x5c/0x7c
[ 204.126556] Instruction dump:
[ 204.126610] 7c0802a6 fba1ffe8 fbc1fff0 fbe1fff8 f8010010 f821ff71 7c7e1b78 60000000
[ 204.126787] 60000000 e87e0298 3143ffff 7d2a1910 2fa90000 40de00c8 ebfe0218
[ 204.126966] ---[ end trace 6e7aefd80add2973 ]---are cleared.
This patch removes iommu_add_device() in pnv_pci_ioda_dma_dev_setup(), which
revert part of the change in commit d905c5df(PPC: POWERNV: move
iommu_add_device earlier).Signed-off-by: Wei Yang
Signed-off-by: Benjamin Herrenschmidt -
With this patch I was able to update firmware on an LE kernel.
Signed-off-by: Anton Blanchard
Signed-off-by: Benjamin Herrenschmidt -
We have a subtle race when sending CPUs back to OPAL on kexec.
We mark them as "in real mode" right before we send them down. Once
we've booted the new kernel, it might try to call opal_reinit_cpus()
to change endianness, and that requires all CPUs to be spinning inside
OPAL.However there is no synchronization here and we've observed cases
where the returning CPUs hadn't established their new state inside
OPAL before opal_reinit_cpus() is called, causing it to fail.The proper fix is to actually wait for them to go down all the way
from the kexec'ing kernel.Signed-off-by: Benjamin Herrenschmidt
-
The size of the sysparam sysfs files is determined from the device tree
at boot. However the buffer is hard coded to 64 bytes. If we encounter a
parameter that is larger than 64, or miss-parse the device tree, the
buffer will overflow when reading or writing to the parameter.Check it at discovery time, and if the parameter is too large, do not
create a sysfs entry for it.Signed-off-by: Joel Stanley
Signed-off-by: Benjamin Herrenschmidt -
Signed-off-by: Benjamin Herrenschmidt
-
The sysparam code currently uses the userspace supplied number of
bytes when memcpy()ing in to a local 64-byte buffer.Limit the maximum number of bytes by the size of the buffer.
Signed-off-by: Benjamin Herrenschmidt
-
The OPAL calls are returning int64_t values, which the sysparam code
stores in an int, and the sysfs callback returns ssize_t. Make code a
easier to read by consistently using ssize_t.Signed-off-by: Joel Stanley
Signed-off-by: Benjamin Herrenschmidt -
When a sysparam query in OPAL returned a negative value (error code),
sysfs would spew out a decent chunk of memory; almost 64K more than
expected. This was traced to a sign/unsigned mix up in the OPAL sysparam
sysfs code at sys_param_show.The return value of sys_param_show is a ssize_t, calculated using
return ret ? ret : attr->param_size;
Alan Modra explains:
"attr->param_size" is an unsigned int, "ret" an int, so the overall
expression has type unsigned int. Result is that ret is cast to
unsigned int before being cast to ssize_t.Instead of using the ternary operator, set ret to the param_size if an
error is not detected. The same bug exists in the sysfs write callback;
this patch fixes it in the same way.A note on debugging this next time: on my system gcc will warn about
this if compiled with -Wsign-compare, which is not enabled by -Wall,
only -Wextra.Signed-off-by: Joel Stanley
Signed-off-by: Benjamin Herrenschmidt -
commit 41dd03a9 may cause Oops in rtas_stop_self().
The reason is that the rtas_args was moved into stack space. For a box
with more that 4GB RAM, the stack could easily be outside 32bit range,
but RTAS is 32bit.So the patch moves rtas_args away from stack by adding static before
it.Signed-off-by: Li Zhong
Signed-off-by: Anton Blanchard
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Benjamin Herrenschmidt -
Commit aac416fc38c (lkdtm: flush icache and report actions) calls
flush_icache_range from a module. It's exported on most architectures
that implement it, but not on powerpc. This patch exports it to fix
the module link failure.Signed-off-by: Jeff Mahoney
Signed-off-by: Benjamin Herrenschmidt
20 Apr, 2014
2 commits
-
Pull x86 fix from Ingo Molnar:
"This fixes the preemption-count imbalance crash reported by Owen
Kibel"* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Fix CMCI preemption bugs -
Pull perf fixes from Ingo Molnar:
"Two kernel side fixes:- an Intel uncore PMU driver potential crash fix
- a kprobes/perf-call-graph interaction fix"* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU
kprobes/x86: Fix page-fault handling logic
19 Apr, 2014
6 commits
-
Merge misc fixes from Andrew Morton:
"13 fixes"* emailed patches from Andrew Morton :
thp: close race between split and zap huge pages
mm: fix new kernel-doc warning in filemap.c
mm: fix CONFIG_DEBUG_VM_RB description
mm: use paravirt friendly ops for NUMA hinting ptes
mips: export flush_icache_range
mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()
wait: explain the shadowing and type inconsistencies
Shiraz has moved
Documentation/vm/numa_memory_policy.txt: fix wrong document in numa_memory_policy.txt
powerpc/mm: fix ".__node_distance" undefined
kernel/watchdog.c:touch_softlockup_watchdog(): use raw_cpu_write()
init/Kconfig: move the trusted keyring config option to general setup
vmscan: reclaim_clean_pages_from_list() must use mod_zone_page_state() -
The lkdtm module performs tests against executable memory ranges, so it
needs to flush the icache for proper behaviors. Other architectures
already export this, so do the same for MIPS.[akpm@linux-foundation.org: relocate export sites]
Signed-off-by: Kees Cook
Cc: Paul Gortmaker
Cc: Ralf Baechle
Cc: Sanjay Lal
Cc: John Crispin
Cc: Sergei Shtylyov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
shiraz.hashim@st.com email-id doesn't exist anymore as he has left the
company. Replace ST's id with shiraz.linux.kernel@gmail.com.It also updates .mailmap file to fix address for 'git shortlog'.
Signed-off-by: Viresh Kumar
Cc: Shiraz Hashim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
CHK include/config/kernel.release
CHK include/generated/uapi/linux/version.h
CHK include/generated/utsrelease.h
...
Building modules, stage 2.
WARNING: 1 bad relocations
c0000000013d6a30 R_PPC64_ADDR64 uprobes_fetch_type_table
WRAP arch/powerpc/boot/zImage.pseries
WRAP arch/powerpc/boot/zImage.epapr
MODPOST 1849 modules
ERROR: ".__node_distance" [drivers/block/nvme.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2
make: *** Waiting for unfinished jobs....The reason is symbol "__node_distance" not been exported in powerpc.
Signed-off-by: Mike Qiu
Acked-by: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Nathan Fontenot
Cc: Stephen Rothwell
Cc: Srivatsa S. Bhat
Cc: Jesse Larrew
Cc: Robert Jennings
Cc: Alistair Popple
Cc: Mike Qiu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit 93ea02bb8435 ("arch: Clean up asm/barrier.h implementations")
wired generic barrier.h for ARC, but failed to delete the existing file.In 3.15, due to rcupdate.h updates, this causes a build breakage on ARC:
CC arch/arc/kernel/asm-offsets.s
In file included from include/linux/sched.h:45:0,
from arch/arc/kernel/asm-offsets.c:9:
include/linux/rculist.h: In function __list_add_rcu:
include/linux/rculist.h:54:2: error: implicit declaration of function smp_store_release [-Werror=implicit-function-declaration]
rcu_assign_pointer(list_next_rcu(prev), new);
^Cc: Peter Zijlstra
Signed-off-by: Vineet Gupta
Signed-off-by: Linus Torvalds -
Pull PCI updates from Bjorn Helgaas:
"These are fixes for a powerpc NULL pointer dereference, an OF
interrupt mapping issue on some of the new host bridges, and a
DesignWare iATU issue.Host bridge drivers
- Fix OF interrupt mapping for DesignWare, R-Car, Tegra (Lucas Stach)
- Fix DesignWare iATU programming (Mohit Kumar)Miscellaneous
- Fix powerpc NULL dereference from list_for_each_entry() update (Mike Qiu)"* tag 'pci-v3.15-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: tegra: Use new OF interrupt mapping when possible
PCI: rcar: Use new OF interrupt mapping when possible
PCI: designware: Use new OF interrupt mapping when possible
PCI: designware: Fix iATU programming for cfg1, io and mem viewport
PCI: designware: Fix comment for setting number of lanes
powerpc/PCI: Fix NULL dereference in sys_pciconfig_iobase() list traversal
18 Apr, 2014
3 commits
-
CPUs which should support the RAPL counters according to
Family/Model/Stepping may still issue #GP when attempting to access
the RAPL MSRs. This may happen when Linux is running under KVM and
we are passing-through host F/M/S data, for example. Use rdmsrl_safe
to first access the RAPL_POWER_UNIT MSR; if this fails, do not
attempt to use this PMU.Signed-off-by: Venkatesh Srinivas
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkateshs@google.com
Cc: zheng.z.yan@intel.com
Cc: eranian@google.com
Cc: ak@linux.intel.com
Cc: linux-kernel@vger.kernel.org
[ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ]
Signed-off-by: Ingo Molnar -
Pull parisc updates from Helge Deller:
"There are two major changes in this patchset:The major fix is that the epoll_pwait() syscall for 32bit userspace
was not using the compat wrapper on a 64bit kernel.Secondly we changed the value of SHMLBA from 4MB to PAGE_SIZE to
reflect that we can actually mmap to any multiple of PAGE_SIZE. The
only thing which needs care is that shared mmaps need to be mapped at
the same offset inside the 4MB cache window"* 'parisc-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: fix epoll_pwait syscall on compat kernel
parisc: change value of SHMLBA from 0x00400000 to PAGE_SIZE
parisc: Replace __get_cpu_var uses for address calculation -
Pull Xen fixes from David Vrabel:
"Xen regression and bug fixes for 3.15-rc1:- fix completely broken 32-bit PV guests caused by x86 refactoring
32-bit thread_info.
- only enable ticketlock slow path on Xen (not bare metal)
- fix two bugs with PV guests not shutting down when requested
- fix a minor memory leak in xen-pciback error path"* tag 'stable/for-linus-3.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/manage: Poweroff forcefully if user-space is not yet up.
xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.
xen/spinlock: Don't enable them unconditionally.
xen-pciback: silence an unwanted debug printk
xen: fix memory leak in __xen_pcibk_add_pci_dev()
x86/xen: Fix 32-bit PV guests's usage of kernel_stack
17 Apr, 2014
8 commits
-
Current kprobes in-kernel page fault handler doesn't
expect that its single-stepping can be interrupted by
an NMI handler which may cause a page fault(e.g. perf
with callback tracing).In that case, the page-fault handled by kprobes and it
misunderstands the page-fault has been caused by the
single-stepping code and tries to recover IP address
to probed address.But the truth is the page-fault has been caused by the
NMI handler, and do_page_fault failes to handle real
page fault because the IP address is modified and
causes Kernel BUGs like below.----
[ 2264.726905] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[ 2264.727190] IP: [] copy_user_generic_string+0x0/0x40To handle this correctly, I fixed the kprobes fault
handler to ensure the faulted ip address is its own
single-step buffer instead of checking current kprobe
state.Signed-off-by: Masami Hiramatsu
Cc: Andi Kleen
Cc: Ananth N Mavinakayanahalli
Cc: Sandeepa Prabhu
Cc: Frederic Weisbecker
Cc: Steven Rostedt
Cc: fche@redhat.com
Cc: systemtap@sourceware.org
Link: http://lkml.kernel.org/r/20140417081644.26341.52351.stgit@ltc230.yrl.intra.hitachi.co.jp
Signed-off-by: Ingo Molnar -
The following commit:
27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms")
Added two preemption bugs:
- machine_check_poll() does a get_cpu_var() without a matching
put_cpu_var(), which causes preemption imbalance and crashes upon
bootup.- it does percpu ops without disabling preemption. Preemption is not
disabled due to the mistaken use of a raw spinlock.To fix these bugs fix the imbalance and change
cmci_discover_lock to a regular spinlock.Reported-by: Owen Kibel
Reported-by: Linus Torvalds
Signed-off-by: Ingo Molnar
Cc: Chen, Gong
Cc: Josh Boyer
Cc: Tony Luck
Cc: Peter Zijlstra
Cc: Alexander Todorov
Cc: Borislav Petkov
Link: http://lkml.kernel.org/n/tip-jtjptvgigpfkpvtQxpEk1at2@git.kernel.org
Signed-off-by: Ingo Molnar
--
arch/x86/kernel/cpu/mcheck/mce.c | 4 +---
arch/x86/kernel/cpu/mcheck/mce_intel.c | 18 +++++++++---------
2 files changed, 10 insertions(+), 12 deletions(-) -
Pull x86 fixes from Ingo Molnar:
"Various fixes:- reboot regression fix
- build message spam fix
- GPU quirk fix
- 'make kvmconfig' fixplus the wire-up of the renameat2() system call on i386"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Remove the PCI reboot method from the default chain
x86/build: Supress "Nothing to be done for ..." messages
x86/gpu: Fix sign extension issue in Intel graphics stolen memory quirks
x86/platform: Fix "make O=dir kvmconfig"
i386: Wire up the renameat2() syscall -
Pull perf fixes from Ingo Molnar:
"Tooling fixes, plus a simple hardware-enablement patch for the Intel
RAPL PMU (energy use measurement) on Haswell CPUs, which I hope is
still fine at this stage"* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Instead of redirecting flex output, use -o
perf tools: Fix double free in perf test 21 (code-reading.c)
perf stat: Initialize statistics correctly
perf bench: Set more defaults in the 'numa' suite
perf bench: Fix segfault at the end of an 'all' execution
perf bench: Update manpage to mention numa and futex
perf probe: Use dwarf_getcfi_elf() instead of dwarf_getcfi()
perf probe: Fix to handle errors in line_range searching
perf probe: Fix --line option behavior
perf tools: Pick up libdw without explicit LIBDW_DIR
MAINTAINERS: Change e-mail to kernel.org one
perf callchains: Disable unwind libraries when libelf isn't found
tools lib traceevent: Do not call warning() directly
tools lib traceevent: Print event name when show warning if possible
perf top: Fix documentation of invalid -s option
perf/x86: Enable DRAM RAPL support on Intel Haswell -
Pull pincontrol fixes from Linus Walleij:
"A first set of pin control fixes for the v3.15 series:- Fix a couple of barnsjukdomar on the Rockchip driver.
- Remove an idiotic debug print I happened to leave behind in the
Nomadik driver.- Fixup the Qualcomm MSM interrupt handling code for the TLMM v2.
- Three patches renaming the Broadcom Capri driver to BCM28155. This
has been falling between the chairs for some time due to some
cross-tree synchronization misunderstandings, now I'm fed up with
this and just rename it in this -rc1 phase"* tag 'pinctrl-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: fix typo in bindings documentation
Update bcm_defconfig with new pinctrl CONFIG
pinctrl: Rename Broadcom Capri pinctrl driver
pinctrl: msm: Correct interrupt code for TLMM v2
pinctrl: nomadik: delete stray debug print
pinctrl: rockchip: handle first half of rk3188-bank0 correctly
pinctrl: rockchip: add return value to rockchip_set_mux
pinctrl: rockchip: fix offset of mux registers for rk3188 -
Pull s390 patches from Martin Schwidefsky:
"An update to the oops output with additional information about the
crash. The renameat2 system call is enabled. Two patches in regard
to the PTR_ERR_OR_ZERO cleanup. And a bunch of bug fixes"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/sclp_cmd: replace PTR_RET with PTR_ERR_OR_ZERO
s390/sclp: replace PTR_RET with PTR_ERR_OR_ZERO
s390/sclp_vt220: Fix kernel panic due to early terminal input
s390/compat: fix typo
s390/uaccess: fix possible register corruption in strnlen_user_srst()
s390: add 31 bit warning message
s390: wire up sys_renameat2
s390: show_registers() should not map user space addresses to kernel symbols
s390/mm: print control registers and page table walk on crash
s390/smp: fix smp_stop_cpu() for !CONFIG_SMP
s390: fix control register update -
Pull itanium erratum fix from Tony Luck:
"Small workaround for a rare, but annoying, erratum #237"* tag 'please-pull-ia64-erratum' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
[IA64] Change default PSR.ac from '1' to '0' (Fix erratum #237) -
April 2014 Itanium processor specification update:
http://www.intel.com/content/www/us/en/processors/itanium/itanium-specification-update.html
describes this erratum:
=========================================================================
237. Under a complex set of conditions, store to load forwarding for a
sub 8-byte load may complete incorrectlyProblem: A load instruction may complete incorrectly when a code sequence
using 4-byte or smaller load and store operations to the same address
is executed in combination with specific timing of all the following
concurrent conditions: store to load forwarding, alignment checking
enabled, a mis-predicted branch, and complex cache utilization activity.Implication: The affected sub 8-byte instruction may complete
incorrectly resulting in unpredictable system behavior. There is an
extremely low probability of exposure due to the significant number of
complex microarchitectural concurrent conditions required to encounter
the erratum.Workaround: Set PSR.ac = 0 to completely avoid the erratum. Disabling
Hyper-Threading will significantly reduce exposure to the conditions
that contribute to encountering the erratum.Status: See the Summary Table of Changes for the affected steppings.
=========================================================================[Table of changes essentially lists all models from McKinley to Tukwila]
The PSR.ac bit controls whether the processor will always generate
an unaligned reference trap (0x5a00) for a misaligned data access
(when PSR.ac=1) or if it will let the access succeed when running
on a cpu that implements logic to handle some unaligned accesses.Way back in 2008 in commit b704882e70d87d7f56db5ff17e2253f3fa90e4f3
[IA64] Rationalize kernel mode alignment checking
we made the decision to always enable strict checking. We were
already doing so in trap/interrupt context because the common
preamble code set this bit - but the rest of supervisor code
(and by inheritance user code) ran with PSR.ac=0.We now reverse that decision and set PSR.ac=0 everywhere in the
kernel (also inherited by user processes). This will avoid the
erratum using the method described in the Itanium specification
update. Net effect for users is that the processor will handle
unaligned access when it can (typically with a tiny performance
bubble in the pipeline ... but much less invasive than taking a
trap and having the OS perform the access).Signed-off-by: Tony Luck
16 Apr, 2014
2 commits
-
Steve reported a reboot hang and bisected it back to this commit:
a4f1987e4c54 x86, reboot: Add EFI and CF9 reboot methods into the default list
He heroically tested all reboot methods and found the following:
reboot=t # triple fault ok
reboot=k # keyboard ctrl FAIL
reboot=b # BIOS ok
reboot=a # ACPI FAIL
reboot=e # EFI FAIL [system has no EFI]
reboot=p # PCI 0xcf9 FAILAnd I think it's pretty obvious that we should only try PCI 0xcf9 as a
last resort - if at all.The other observation is that (on this box) we should never try
the PCI reboot method, but close with either the 'triple fault'
or the 'BIOS' (terminal!) reboot methods.Thirdly, CF9_COND is a total misnomer - it should be something like
CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...So this patch fixes the worst problems:
- it orders the actual reboot logic to follow the reboot ordering
pattern - it was in a pretty random order before for no good
reason.- it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
BOOT_CF9_SAFE flags to make the code more obvious.- it tries the BIOS reboot method before the PCI reboot method.
(Since 'BIOS' is a terminal reboot method resulting in a hang
if it does not work, this is essentially equivalent to removing
the PCI reboot method from the default reboot chain.)- just for the miraculous possibility of terminal (resulting
in hang) reboot methods of triple fault or BIOS returning
without having done their job, there's an ordering between
them as well.Reported-and-bisected-and-tested-by: Steven Rostedt
Cc: Li Aubrey
Cc: Linus Torvalds
Cc: Matthew Garrett
Link: http://lkml.kernel.org/r/20140404064120.GB11877@gmail.com
Signed-off-by: Ingo Molnar -
The git commit a945928ea2709bc0e8e8165d33aed855a0110279
('xen: Do not enable spinlocks before jump_label_init() has executed')
was added to deal with the jump machinery. Earlier the code
that turned on the jump label was only called by Xen specific
functions. But now that it had been moved to the initcall machinery
it gets called on Xen, KVM, and baremetal - ouch!. And the detection
machinery to only call it on Xen wasn't remembered in the heat
of merge window excitement.This means that the slowpath is enabled on baremetal while it should
not be.Reported-by: Waiman Long
Acked-by: Steven Rostedt
CC: stable@vger.kernel.org
CC: Boris Ostrovsky
Signed-off-by: Konrad Rzeszutek Wilk
Signed-off-by: David Vrabel
15 Apr, 2014
3 commits
-
Commit 198d208df4371734ac4728f69cb585c284d20a15 ("x86: Keep
thread_info on thread stack in x86_32") made 32-bit kernels use
kernel_stack to point to thread_info. That change missed a couple of
updates needed by Xen's 32-bit PV guests:1. kernel_stack needs to be initialized for secondary CPUs
2. GET_THREAD_INFO() now uses %fs register which may not be the
kernel's version when executing xen_iret().With respect to the second issue, we don't need GET_THREAD_INFO()
anymore: we used it as an intermediate step to get to per_cpu xen_vcpu
and avoid referencing %fs. Now that we are going to use %fs anyway we
may as well go directly to xen_vcpu.Signed-off-by: Boris Ostrovsky
Signed-off-by: David Vrabel -
Pull KVM fixes from Marcelo Tosatti:
- Fix for guest triggerable BUG_ON (CVE-2014-0155)
- CR4.SMAP support
- Spurious WARN_ON() fix* git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: remove WARN_ON from get_kernel_ns()
KVM: Rename variable smep to cr4_smep
KVM: expose SMAP feature to guest
KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode
KVM: Add SMAP support when setting CR4
KVM: Remove SMAP bit from CR4_RESERVED_BITS
KVM: ioapic: try to recover if pending_eoi goes out of range
KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155) -
3bc955987fb3 ("powerpc/PCI: Use list_for_each_entry() for bus traversal")
caused a NULL pointer dereference because the loop body set the iterator to
NULL:Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc000000000041d78
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c000000000041d78] .sys_pciconfig_iobase+0x68/0x1f0
LR [c000000000041e0c] .sys_pciconfig_iobase+0xfc/0x1f0
Call Trace:
[c0000003b4787db0] [c000000000041e0c] .sys_pciconfig_iobase+0xfc/0x1f0 (unreliable)
[c0000003b4787e30] [c000000000009ed8] syscall_exit+0x0/0x98Fix it by using a temporary variable for the iterator.
[bhelgaas: changelog, drop tmp_bus initialization]
Fixes: 3bc955987fb3 powerpc/PCI: Use list_for_each_entry() for bus traversal
Signed-off-by: Mike Qiu
Signed-off-by: Bjorn Helgaas