Eric Lee / smarc-fsl-linux-kernel

20 Nov, 2013

1 commit

9de648830 PCI: use weak functions for MSI arch-specific functions ... Browse Code »

Until now, the MSI architecture-specific functions could be overloaded
using a fairly complex set of #define and compile-time
conditionals. In order to prepare for the introduction of the msi_chip
infrastructure, it is desirable to switch all those functions to use
the 'weak' mechanism. This commit converts all the architectures that
were overidding those MSI functions to use the new strategy.

Note that we keep two separate, non-weak, functions
default_teardown_msi_irqs() and default_restore_msi_irqs() for the
default behavior of the arch_teardown_msi_irqs() and
arch_restore_msi_irqs(), as the default behavior is needed by x86 PCI
code.

Signed-off-by: Thomas Petazzoni
Acked-by: Bjorn Helgaas
Acked-by: Benjamin Herrenschmidt
Tested-by: Daniel Price
Tested-by: Thierry Reding
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: linux390@de.ibm.com
Cc: linux-s390@vger.kernel.org
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: H. Peter Anvin
Cc: x86@kernel.org
Cc: Russell King
Cc: Tony Luck
Cc: Fenghua Yu
Cc: linux-ia64@vger.kernel.org
Cc: Ralf Baechle
Cc: linux-mips@linux-mips.org
Cc: David S. Miller
Cc: sparclinux@vger.kernel.org
Cc: Chris Metcalf
Signed-off-by: Jason Cooper

Thomas Petazzoni
2013-11-20 14:10:30 +0800

18 Oct, 2013

2 commits

954acd2b6 x86: avoid remapping data in parse_setup_data() ... Browse Code »

commit 30e46b574a1db7d14404e52dca8e1aa5f5155fd2 upstream.

Type SETUP_PCI, added by setup_efi_pci(), may advertise a ROM size
larger than early_memremap() is able to handle, which is currently
limited to 256kB. If this occurs it leads to a NULL dereference in
parse_setup_data().

To avoid this, remap the setup_data header and allow parsing functions
for individual types to handle their own data remapping.

Signed-off-by: Linn Crosetto
Link: http://lkml.kernel.org/r/1376430401-67445-1-git-send-email-linn@hp.com
Acked-by: Yinghai Lu
Reviewed-by: Pekka Enberg
Signed-off-by: H. Peter Anvin
Cc: Paul Gortmaker
Signed-off-by: Greg Kroah-Hartman

Linn Crosetto
2013-10-18 22:45:48 +0800
bb42ad4e4 compiler/gcc4: Add quirk for 'asm goto' miscompilation bug ... Browse Code »

commit 3f0116c3238a96bc18ad4b4acefe4e7be32fa861 upstream.

Fengguang Wu, Oleg Nesterov and Peter Zijlstra tracked down
a kernel crash to a GCC bug: GCC miscompiles certain 'asm goto'
constructs, as outlined here:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670

Implement a workaround suggested by Jakub Jelinek.

Reported-and-tested-by: Fengguang Wu
Reported-by: Oleg Nesterov
Reported-by: Peter Zijlstra
Suggested-by: Jakub Jelinek
Reviewed-by: Richard Henderson
Cc: Linus Torvalds
Cc: Andrew Morton
Link: http://lkml.kernel.org/r/20131015062351.GA4666@gmail.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Ingo Molnar
2013-10-18 22:45:45 +0800

05 Oct, 2013

2 commits

81fbb94d1 x86, efi: Don't map Boot Services on i386 ... Browse Code »

commit 700870119f49084da004ab588ea2b799689efaf7 upstream.

Add patch to fix 32bit EFI service mapping (rhbz 726701)

Multiple people are reporting hitting the following WARNING on i386,

WARNING: at arch/x86/mm/ioremap.c:102 __ioremap_caller+0x3d3/0x440()
Modules linked in:
Pid: 0, comm: swapper Not tainted 3.9.0-rc7+ #95
Call Trace:
[] warn_slowpath_common+0x5f/0x80
[] ? __ioremap_caller+0x3d3/0x440
[] ? __ioremap_caller+0x3d3/0x440
[] warn_slowpath_null+0x1d/0x20
[] __ioremap_caller+0x3d3/0x440
[] ? get_usage_chars+0xfb/0x110
[] ? vprintk_emit+0x147/0x480
[] ? efi_enter_virtual_mode+0x1e4/0x3de
[] ioremap_cache+0x1a/0x20
[] ? efi_enter_virtual_mode+0x1e4/0x3de
[] efi_enter_virtual_mode+0x1e4/0x3de
[] start_kernel+0x286/0x2f4
[] ? repair_env_string+0x51/0x51
[] i386_start_kernel+0x12c/0x12f

Due to the workaround described in commit 916f676f8 ("x86, efi: Retain
boot service code until after switching to virtual mode") EFI Boot
Service regions are mapped for a period during boot. Unfortunately, with
the limited size of the i386 direct kernel map it's possible that some
of the Boot Service regions will not be directly accessible, which
causes them to be ioremap()'d, triggering the above warning as the
regions are marked as E820_RAM in the e820 memmap.

There are currently only two situations where we need to map EFI Boot
Service regions,

1. To workaround the firmware bug described in 916f676f8
2. To access the ACPI BGRT image

but since we haven't seen an i386 implementation that requires either,
this simple fix should suffice for now.

[ Added to changelog - Matt ]

Reported-by: Bryan O'Donoghue
Acked-by: Tom Zanussi
Acked-by: Darren Hart
Cc: Josh Triplett
Cc: Matthew Garrett
Cc: H. Peter Anvin
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Greg Kroah-Hartman
Signed-off-by: Josh Boyer
Signed-off-by: Matt Fleming
Signed-off-by: Greg Kroah-Hartman

Josh Boyer
2013-10-05 22:13:10 +0800
fdc43786e x86/reboot: Add quirk to make Dell C6100 use reboot=pci automatically ... Browse Code »

commit 4f0acd31c31f03ba42494c8baf6c0465150e2621 upstream.

Dell PowerEdge C6100 machines fail to completely reboot about 20% of the time.

Signed-off-by: Masoud Sharbiani
Signed-off-by: Vinson Lee
Cc: Robin Holt
Cc: Russell King
Cc: Guan Xuetao
Link: http://lkml.kernel.org/r/1379717947-18042-1-git-send-email-vlee@freedesktop.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Masoud Sharbiani
2013-10-05 22:13:09 +0800

27 Sep, 2013

5 commits

c18c0f9da sched/x86: Optimize switch_mm() for multi-threaded workloads ... Browse Code »

commit 8f898fbbe5ee5e20a77c4074472a1fd088dc47d1 upstream.

Dick Fowles, Don Zickus and Joe Mario have been working on
improvements to perf, and noticed heavy cache line contention
on the mm_cpumask, running linpack on a 60 core / 120 thread
system.

The cause turned out to be unnecessary atomic accesses to the
mm_cpumask. When in lazy TLB mode, the CPU is only removed from
the mm_cpumask if there is a TLB flush event.

Most of the time, no such TLB flush happens, and the kernel
skips the TLB reload. It can also skip the atomic memory
set & test.

Here is a summary of Joe's test results:

* The __schedule function dropped from 24% of all program cycles down
to 5.5%.

* The cacheline contention/hotness for accesses to that bitmask went
from being the 1st/2nd hottest - down to the 84th hottest (0.3% of
all shared misses which is now quite cold)

* The average load latency for the bit-test-n-set instruction in
__schedule dropped from 10k-15k cycles down to an average of 600 cycles.

* The linpack program results improved from 133 GFlops to 144 GFlops.
Peak GFlops rose from 133 to 153.

Reported-by: Don Zickus
Reported-by: Joe Mario
Tested-by: Joe Mario
Signed-off-by: Rik van Riel
Reviewed-by: Paul Turner
Acked-by: Linus Torvalds
Link: http://lkml.kernel.org/r/20130731221421.616d3d20@annuminas.surriel.com
[ Made the comments consistent around the modified code. ]
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Rik van Riel
2013-09-27 08:18:14 +0800
8ceb02f11 x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors ... Browse Code »

commit 0ca06c0857aee11911f91621db14498496f2c2cd upstream.

The 0x1000 bit of the MCACOD field of machine check MCi_STATUS
registers is only defined for corrected errors (where it means
that hardware may be filtering errors see SDM section 15.9.2.1).

For uncorrected errors it may, or may not be set - so we should mask
it out when checking for the architecturaly defined recoverable
error signatures (see SDM 15.9.3.1 and 15.9.3.2)

Acked-by: Naveen N. Rao
Signed-off-by: Tony Luck
Signed-off-by: Greg Kroah-Hartman

Tony Luck
2013-09-27 08:18:14 +0800
b50361f39 x86, amd_nb: Clarify F15h, model 30h GART and L3 support ... Browse Code »

commit 7d64ac6422092adbbdaa279ab32f9d4c90a84558 upstream.

F15h, models 0x30 and later don't have a GART. Note that. Also check
CPUID leaf 0x80000006 for L3 prescence because there are models which
don't sport an L3 cache.

Signed-off-by: Aravind Gopalakrishnan
[ Boris: rewrite commit message, cleanup comments. ]
Signed-off-by: Borislav Petkov
Signed-off-by: Greg Kroah-Hartman

Aravind Gopalakrishnan
2013-09-27 08:18:14 +0800
a168ad268 Introduce [compat_]save_altstack_ex() to unbreak x86 SMAP ... Browse Code »

commit bd1c149aa9915b9abb6d83d0f01dfd2ace0680b5 upstream.

For performance reasons, when SMAP is in use, SMAP is left open for an
entire put_user_try { ... } put_user_catch(); block, however, calling
__put_user() in the middle of that block will close SMAP as the
STAC..CLAC constructs intentionally do not nest.

Furthermore, using __put_user() rather than put_user_ex() here is bad
for performance.

Thus, introduce new [compat_]save_altstack_ex() helpers that replace
__[compat_]save_altstack() for x86, being currently the only
architecture which supports put_user_try { ... } put_user_catch().

Reported-by: H. Peter Anvin
Signed-off-by: Al Viro
Signed-off-by: H. Peter Anvin
Link: http://lkml.kernel.org/n/tip-es5p6y64if71k8p5u08agv9n@git.kernel.org
Signed-off-by: Greg Kroah-Hartman

Al Viro
2013-09-27 08:18:13 +0800
1159cd1de x86, smap: Handle csum_partial_copy_*_user() ... Browse Code »

commit 7263dda41b5a28ae6566fd126d9b06ada73dd721 upstream.

Add SMAP annotations to csum_partial_copy_to/from_user(). These
functions legitimately access user space and thus need to set the AC
flag.

TODO: add explicit checks that the side with the kernel space pointer
really points into kernel space.

Signed-off-by: H. Peter Anvin
Link: http://lkml.kernel.org/n/tip-2aps0u00eer658fd5xyanan7@git.kernel.org
Signed-off-by: Greg Kroah-Hartman

H. Peter Anvin
2013-09-27 08:18:13 +0800

14 Sep, 2013

1 commit

751190a69 crypto: xor - Check for osxsave as well as avx in crypto/xor ... Browse Code »

commit edb6f29464afc65fc73767540b854abf63ae7144 upstream.

This affects xen pv guests with sufficiently old versions of xen and
sufficiently new hardware. On such a system, a guest with a btrfs
root won't even boot.

Signed-off-by: John Haxby
Signed-off-by: Herbert Xu
Reported-by: Michael Marineau
Signed-off-by: Greg Kroah-Hartman

John Haxby
2013-09-14 21:54:56 +0800

08 Sep, 2013

1 commit

7c886152e x86/mm: Fix boot crash with DEBUG_PAGE_ALLOC=y and more than 512G RAM ... Browse Code »

commit 527bf129f9a780e11b251cf2467dc30118a57d16 upstream.

Dave Hansen reported that systems between 500G and 600G RAM
crash early if DEBUG_PAGEALLOC is selected.

> [ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
> [ 0.000000] [mem 0x00000000-0x000fffff] page 4k
> [ 0.000000] BRK [0x02086000, 0x02086fff] PGTABLE
> [ 0.000000] BRK [0x02087000, 0x02087fff] PGTABLE
> [ 0.000000] BRK [0x02088000, 0x02088fff] PGTABLE
> [ 0.000000] init_memory_mapping: [mem 0xe80ee00000-0xe80effffff]
> [ 0.000000] [mem 0xe80ee00000-0xe80effffff] page 4k
> [ 0.000000] BRK [0x02089000, 0x02089fff] PGTABLE
> [ 0.000000] BRK [0x0208a000, 0x0208afff] PGTABLE
> [ 0.000000] Kernel panic - not syncing: alloc_low_page: ran out of memory

It turns out that we missed increasing needed pages in BRK to
mapping initial 2M and [0,1M) when we switched to use the #PF
handler to set memory mappings:

> commit 8170e6bed465b4b0c7687f93e9948aca4358a33b
> Author: H. Peter Anvin
> Date: Thu Jan 24 12:19:52 2013 -0800
>
> x86, 64bit: Use a #PF handler to materialize early mappings on demand

Before that, we had the maping from [0,512M) in head_64.S, and we
can spare two pages [0-1M). After that change, we can not reuse
pages anymore.

When we have more than 512M ram, we need an extra page for pgd page
with [512G, 1024g).

Increase pages in BRK for page table to solve the boot crash.

Reported-by: Dave Hansen
Bisected-by: Dave Hansen
Tested-by: Dave Hansen
Signed-off-by: Yinghai Lu
Link: http://lkml.kernel.org/r/1376351004-4015-1-git-send-email-yinghai@kernel.org
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Yinghai Lu
2013-09-08 13:09:58 +0800

30 Aug, 2013

5 commits

061cc2478 x86/xen: do not identity map UNUSABLE regions in the machine E820 ... Browse Code »

commit 3bc38cbceb85881a8eb789ee1aa56678038b1909 upstream.

If there are UNUSABLE regions in the machine memory map, dom0 will
attempt to map them 1:1 which is not permitted by Xen and the kernel
will crash.

There isn't anything interesting in the UNUSABLE region that the dom0
kernel needs access to so we can avoid making the 1:1 mapping and
treat it as RAM.

We only do this for dom0, as that is where tboot case shows up.
A PV domU could have an UNUSABLE region in its pseudo-physical map
and would need to be handled in another patch.

This fixes a boot failure on hosts with tboot.

tboot marks a region in the e820 map as unusable and the dom0 kernel
would attempt to map this region and Xen does not permit unusable
regions to be mapped by guests.

(XEN) 0000000000000000 - 0000000000060000 (usable)
(XEN) 0000000000060000 - 0000000000068000 (reserved)
(XEN) 0000000000068000 - 000000000009e000 (usable)
(XEN) 0000000000100000 - 0000000000800000 (usable)
(XEN) 0000000000800000 - 0000000000972000 (unusable)

tboot marked this region as unusable.

(XEN) 0000000000972000 - 00000000cf200000 (usable)
(XEN) 00000000cf200000 - 00000000cf38f000 (reserved)
(XEN) 00000000cf38f000 - 00000000cf3ce000 (ACPI data)
(XEN) 00000000cf3ce000 - 00000000d0000000 (reserved)
(XEN) 00000000e0000000 - 00000000f0000000 (reserved)
(XEN) 00000000fe000000 - 0000000100000000 (reserved)
(XEN) 0000000100000000 - 0000000630000000 (usable)

Signed-off-by: David Vrabel
[v1: Altered the patch and description with domU's with UNUSABLE regions]
Signed-off-by: Konrad Rzeszutek Wilk
Signed-off-by: Greg Kroah-Hartman

David Vrabel
2013-08-30 00:47:40 +0800
ff1a668bc x86 get_unmapped_area: Access mmap_legacy_base through mm_struct member ... Browse Code »

commit 41aacc1eea645c99edbe8fbcf78a97dc9b862adc upstream.

This is the updated version of df54d6fa5427 ("x86 get_unmapped_area():
use proper mmap base for bottom-up direction") that only randomizes the
mmap base address once.

Signed-off-by: Radu Caragea
Reported-and-tested-by: Jeff Shorey
Cc: Andrew Morton
Cc: Michel Lespinasse
Cc: Oleg Nesterov
Cc: Rik van Riel
Cc: Ingo Molnar
Cc: Adrian Sendroiu
Cc: Kamal Mostafa
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Radu Caragea
2013-08-30 00:47:39 +0800
bd4b69c1f Revert "x86 get_unmapped_area(): use proper mmap base for bottom-up direction" ... Browse Code »

commit 5ea80f76a56605a190a7ea16846c82aa63dbd0aa upstream.

This reverts commit df54d6fa54275ce59660453e29d1228c2b45a826.

The commit isn't necessarily wrong, but because it recalculates the
random mmap_base every time, it seems to confuse user memory allocators
that expect contiguous mmap allocations even when the mmap address isn't
specified.

In particular, the MATLAB Java runtime seems to be unhappy. See

https://bugzilla.kernel.org/show_bug.cgi?id=60774

So we'll want to apply the random offset only once, and Radu has a patch
for that. Revert this older commit in order to apply the other one.

Reported-by: Jeff Shorey
Cc: Radu Caragea
Cc: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Linus Torvalds
2013-08-30 00:47:39 +0800
8a34139a0 x86: Don't clear olpc_ofw_header when sentinel is detected ... Browse Code »

commit d55e37bb0f51316e552376ddc0a3fff34ca7108b upstream.

OpenFirmware wasn't quite following the protocol described in boot.txt
and the kernel has detected this through use of the sentinel value
in boot_params. OFW does zero out almost all of the stuff that it should
do, but not the sentinel.

This causes the kernel to clear olpc_ofw_header, which breaks x86 OLPC
support.

OpenFirmware has now been fixed. However, it would be nice if we could
maintain Linux compatibility with old firmware versions. To do that, we just
have to avoid zeroing out olpc_ofw_header.

OFW does not write to any other parts of the header that are being zapped
by the sentinel-detection code, and all users of olpc_ofw_header are
somewhat protected through checking for the OLPC_OFW_SIG magic value
before using it. So this should not cause any problems for anyone.

Signed-off-by: Daniel Drake
Link: http://lkml.kernel.org/r/20130809221420.618E6FAB03@dev.laptop.org
Acked-by: Yinghai Lu
Signed-off-by: H. Peter Anvin
Signed-off-by: Greg Kroah-Hartman

Daniel Drake
2013-08-30 00:47:35 +0800
51f508721 xen/smp: initialize IPI vectors before marking CPU online ... Browse Code »

commit fc78d343fa74514f6fd117b5ef4cd27e4ac30236 upstream.

An older PVHVM guest (v3.0 based) crashed during vCPU hot-plug with:

kernel BUG at drivers/xen/events.c:1328!

RCU has detected that a CPU has not entered a quiescent state within the
grace period. It needs to send the CPU a reschedule IPI if it is not
offline. rcu_implicit_offline_qs() does this check:

/*
* If the CPU is offline, it is in a quiescent state. We can
* trust its state not to change because interrupts are disabled.
*/
if (cpu_is_offline(rdp->cpu)) {
rdp->offline_fqs++;
return 1;
}

Else the CPU is online. Send it a reschedule IPI.

The CPU is in the middle of being hot-plugged and has been marked online
(!cpu_is_offline()). See start_secondary():

set_cpu_online(smp_processor_id(), true);
...
per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;

start_secondary() then waits for the CPU bringing up the hot-plugged CPU to
mark it as active:

/*
* Wait until the cpu which brought this one up marked it
* online before enabling interrupts. If we don't do that then
* we can end up waking up the softirq thread before this cpu
* reached the active state, which makes the scheduler unhappy
* and schedule the softirq thread on the wrong cpu. This is
* only observable with forced threaded interrupts, but in
* theory it could also happen w/o them. It's just way harder
* to achieve.
*/
while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
cpu_relax();

/* enable local interrupts */
local_irq_enable();

The CPU being hot-plugged will be marked active after it has been fully
initialized by the CPU managing the hot-plug. In the Xen PVHVM case
xen_smp_intr_init() is called to set up the hot-plugged vCPU's
XEN_RESCHEDULE_VECTOR.

The hot-plugging CPU is marked online, not marked active and does not have
its IPI vectors set up. rcu_implicit_offline_qs() sees the hot-plugging
cpu is !cpu_is_offline() and tries to send it a reschedule IPI:
This will lead to:

kernel BUG at drivers/xen/events.c:1328!

xen_send_IPI_one()
xen_smp_send_reschedule()
rcu_implicit_offline_qs()
rcu_implicit_dynticks_qs()
force_qs_rnp()
force_quiescent_state()
__rcu_process_callbacks()
rcu_process_callbacks()
__do_softirq()
call_softirq()
do_softirq()
irq_exit()
xen_evtchn_do_upcall()

because xen_send_IPI_one() will attempt to use an uninitialized IRQ for
the XEN_RESCHEDULE_VECTOR.

There is at least one other place that has caused the same crash:

xen_smp_send_reschedule()
wake_up_idle_cpu()
add_timer_on()
clocksource_watchdog()
call_timer_fn()
run_timer_softirq()
__do_softirq()
call_softirq()
do_softirq()
irq_exit()
xen_evtchn_do_upcall()
xen_hvm_callback_vector()

clocksource_watchdog() uses cpu_online_mask to pick the next CPU to handle
a watchdog timer:

/*
* Cycle through CPUs to check if the CPUs stay synchronized
* to each other.
*/
next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
if (next_cpu >= nr_cpu_ids)
next_cpu = cpumask_first(cpu_online_mask);
watchdog_timer.expires += WATCHDOG_INTERVAL;
add_timer_on(&watchdog_timer, next_cpu);

This resulted in an attempt to send an IPI to a hot-plugging CPU that
had not initialized its reschedule vector. One option would be to make
the RCU code check to not check for CPU offline but for CPU active.
As becoming active is done after a CPU is online (in older kernels).

But Srivatsa pointed out that "the cpu_active vs cpu_online ordering has been
completely reworked - in the online path, cpu_active is set *before* cpu_online,
and also, in the cpu offline path, the cpu_active bit is reset in the CPU_DYING
notification instead of CPU_DOWN_PREPARE." Drilling in this the bring-up
path: "[brought up CPU].. send out a CPU_STARTING notification, and in response
to that, the scheduler sets the CPU in the cpu_active_mask. Again, this mask
is better left to the scheduler alone, since it has the intelligence to use it
judiciously."

The conclusion was that:
"
1. At the IPI sender side:

It is incorrect to send an IPI to an offline CPU (cpu not present in
the cpu_online_mask). There are numerous places where we check this
and warn/complain.

2. At the IPI receiver side:

It is incorrect to let the world know of our presence (by setting
ourselves in global bitmasks) until our initialization steps are complete
to such an extent that we can handle the consequences (such as
receiving interrupts without crashing the sender etc.)
" (from Srivatsa)

As the native code enables the interrupts at some point we need to be
able to service them. In other words a CPU must have valid IPI vectors
if it has been marked online.

It doesn't need to handle the IPI (interrupts may be disabled) but needs
to have valid IPI vectors because another CPU may find it in cpu_online_mask
and attempt to send it an IPI.

This patch will change the order of the Xen vCPU bring-up functions so that
Xen vectors have been set up before start_secondary() is called.
It also will not continue to bring up a Xen vCPU if xen_smp_intr_init() fails
to initialize it.

Orabug 13823853
Signed-off-by Chuck Anderson
Acked-by: Srivatsa S. Bhat
Signed-off-by: Konrad Rzeszutek Wilk
Signed-off-by: Jonghwan Choi
Signed-off-by: Greg Kroah-Hartman

Chuck Anderson
2013-08-30 00:47:34 +0800

20 Aug, 2013

2 commits

f6c19e2f7 x86 get_unmapped_area(): use proper mmap base for bottom-up direction ... Browse Code »

commit df54d6fa54275ce59660453e29d1228c2b45a826 upstream.

When the stack is set to unlimited, the bottomup direction is used for
mmap-ings but the mmap_base is not used and thus effectively renders
ASLR for mmapings along with PIE useless.

Reviewed-by: Rik van Riel
Cc: Michel Lespinasse
Cc: Oleg Nesterov
Acked-by: Ingo Molnar
Cc: Adrian Sendroiu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Radu Caragea
2013-08-20 23:43:02 +0800
78de90f9f perf/x86: Fix intel QPI uncore event definitions ... Browse Code »

commit c9601247f8f3fdc18aed7ed7e490e8dfcd07f122 upstream.

John McCalpin reports that the "drs_data" and "ncb_data" QPI
uncore events are missing the "extra bit" and always return zero
values unless the bit is properly set.

More details from him:

According to the Xeon E5-2600 Product Family Uncore Performance
Monitoring Guide, Table 2-94, about 1/2 of the QPI Link Layer events
(including the ones that "perf" calls "drs_data" and "ncb_data") require
that the "extra bit" be set.

This was confusing for a while -- a note at the bottom of page 94 says
that the "extra bit" is bit 16 of the control register.
Unfortunately, Table 2-86 clearly says that bit 16 is reserved and must
be zero. Looking around a bit, I found that bit 21 appears to be the
correct "extra bit", and further investigation shows that "perf" actually
agrees with me:
[root@c560-003.stampede]# cat /sys/bus/event_source/devices/uncore_qpi_0/format/event
config:0-7,21

So the command
# perf -e "uncore_qpi_0/event=drs_data/"
Is the same as
# perf -e "uncore_qpi_0/event=0x02,umask=0x08/"
While it should be
# perf -e "uncore_qpi_0/event=0x102,umask=0x08/"

I confirmed that this last version gives results that agree with the
amount of data that I expected the STREAM benchmark to move across the QPI
link in the second (cross-chip) test of the original script.

Reported-by: John McCalpin
Signed-off-by: Vince Weaver
Cc: zheng.z.yan@intel.com
Cc: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Paul Mackerras
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1308021037280.26119@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Vince Weaver
2013-08-20 23:43:02 +0800

12 Aug, 2013

2 commits

8d60f48be x86/iommu/vt-d: Expand interrupt remapping quirk to cover x58 chipset ... Browse Code »

commit 803075dba31c17af110e1d9a915fe7262165b213 upstream.

Recently we added an early quirk to detect 5500/5520 chipsets
with early revisions that had problems with irq draining with
interrupt remapping enabled:

commit 03bbcb2e7e292838bb0244f5a7816d194c911d62
Author: Neil Horman
Date: Tue Apr 16 16:38:32 2013 -0400

iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets

It turns out this same problem is present in the intel X58
chipset as well. See errata 69 here:

http://www.intel.com/content/www/us/en/chipsets/x58-express-specification-update.html

This patch extends the pci early quirk so that the chip
devices/revisions specified in the above update are also covered
in the same way:

Signed-off-by: Neil Horman
Reviewed-by: Jan Beulich
Acked-by: Donald Dutile
Cc: Joerg Roedel
Cc: Andrew Cooper
Cc: Malcolm Crossley
Cc: Prarit Bhargava
Cc: Don Zickus
Link: http://lkml.kernel.org/r/1374059639-8631-1-git-send-email-nhorman@tuxdriver.com
[ Small edits. ]
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Neil Horman
2013-08-12 09:35:25 +0800
62fda513d x86, fpu: correct the asm constraints for fxsave, unbreak mxcsr.daz ... Browse Code »

commit eaa5a990191d204ba0f9d35dbe5505ec2cdd1460 upstream.

GCC will optimize mxcsr_feature_mask_init in arch/x86/kernel/i387.c:

memset(&fx_scratch, 0, sizeof(struct i387_fxsave_struct));
asm volatile("fxsave %0" : : "m" (fx_scratch));
mask = fx_scratch.mxcsr_mask;
if (mask == 0)
mask = 0x0000ffbf;

to

memset(&fx_scratch, 0, sizeof(struct i387_fxsave_struct));
asm volatile("fxsave %0" : : "m" (fx_scratch));
mask = 0x0000ffbf;

since asm statement doesn’t say it will update fx_scratch. As the
result, the DAZ bit will be cleared. This patch fixes it. This bug
dates back to at least kernel 2.6.12.

Signed-off-by: H. Peter Anvin
Signed-off-by: Greg Kroah-Hartman

H.J. Lu
2013-08-12 09:35:24 +0800

04 Aug, 2013

3 commits

b18516288 x86: Fix /proc/mtrr with base/size more than 44bits ... Browse Code »

commit d5c78673b1b28467354c2c30c3d4f003666ff385 upstream.

On one sytem that mtrr range is more then 44bits, in dmesg we have
[ 0.000000] MTRR default type: write-back
[ 0.000000] MTRR fixed ranges enabled:
[ 0.000000] 00000-9FFFF write-back
[ 0.000000] A0000-BFFFF uncachable
[ 0.000000] C0000-DFFFF write-through
[ 0.000000] E0000-FFFFF write-protect
[ 0.000000] MTRR variable ranges enabled:
[ 0.000000] 0 [000080000000-0000FFFFFFFF] mask 3FFF80000000 uncachable
[ 0.000000] 1 [380000000000-38FFFFFFFFFF] mask 3F0000000000 uncachable
[ 0.000000] 2 [000099000000-000099FFFFFF] mask 3FFFFF000000 write-through
[ 0.000000] 3 [00009A000000-00009AFFFFFF] mask 3FFFFF000000 write-through
[ 0.000000] 4 [381FFA000000-381FFBFFFFFF] mask 3FFFFE000000 write-through
[ 0.000000] 5 [381FFC000000-381FFC0FFFFF] mask 3FFFFFF00000 write-through
[ 0.000000] 6 [0000AD000000-0000ADFFFFFF] mask 3FFFFF000000 write-through
[ 0.000000] 7 [0000BD000000-0000BDFFFFFF] mask 3FFFFF000000 write-through
[ 0.000000] 8 disabled
[ 0.000000] 9 disabled

but /proc/mtrr report wrong:
reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
reg01: base=0x80000000000 (8388608MB), size=1048576MB, count=1: uncachable
reg02: base=0x099000000 ( 2448MB), size= 16MB, count=1: write-through
reg03: base=0x09a000000 ( 2464MB), size= 16MB, count=1: write-through
reg04: base=0x81ffa000000 (8519584MB), size= 32MB, count=1: write-through
reg05: base=0x81ffc000000 (8519616MB), size= 1MB, count=1: write-through
reg06: base=0x0ad000000 ( 2768MB), size= 16MB, count=1: write-through
reg07: base=0x0bd000000 ( 3024MB), size= 16MB, count=1: write-through
reg08: base=0x09b000000 ( 2480MB), size= 16MB, count=1: write-combining

so bit 44 and bit 45 get cut off.

We have problems in arch/x86/kernel/cpu/mtrr/generic.c::generic_get_mtrr().
1. for base, we miss cast base_lo to 64bit before shifting.
Fix that by adding u64 casting.

2. for size, it only can handle 44 bits aka 32bits + page_shift
Fix that with 64bit mask instead of 32bit mask_lo, then range could be
more than 44bits.
At the same time, we need to update size_or_mask for old cpus that does
support cpuid 0x80000008 to get phys_addr. Need to set high 32bits
to all 1s, otherwise will not get correct size for them.

Also fix mtrr_add_page: it should check base and (base + size - 1)
instead of base and size, as base and size could be small but
base + size could bigger enough to be out of boundary. We can
use boot_cpu_data.x86_phys_bits directly to avoid size_or_mask.

So When are we going to have size more than 44bits? that is 16TiB.

after patch we have right ouput:
reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=1048576MB, count=1: uncachable
reg02: base=0x099000000 ( 2448MB), size= 16MB, count=1: write-through
reg03: base=0x09a000000 ( 2464MB), size= 16MB, count=1: write-through
reg04: base=0x381ffa000000 (58851232MB), size= 32MB, count=1: write-through
reg05: base=0x381ffc000000 (58851264MB), size= 1MB, count=1: write-through
reg06: base=0x0ad000000 ( 2768MB), size= 16MB, count=1: write-through
reg07: base=0x0bd000000 ( 3024MB), size= 16MB, count=1: write-through
reg08: base=0x09b000000 ( 2480MB), size= 16MB, count=1: write-combining

-v2: simply checking in mtrr_add_page according to hpa.

[ hpa: This probably wants to go into -stable only after having sat in
mainline for a bit. It is not a regression. ]

Signed-off-by: Yinghai Lu
Link: http://lkml.kernel.org/r/1371162815-29931-1-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin
Signed-off-by: Greg Kroah-Hartman

Yinghai Lu
2013-08-04 16:51:18 +0800
943741e35 x86: make sure IDT is page aligned ... Browse Code »

based on 4df05f361937ee86e5a8c9ead8aeb6a19ea9b7d7 upstream.

Since the IDT is referenced from a fixmap, make sure it is page aligned.
This avoids the risk of the IDT ever being moved in the bss and having
the mapping be offset, resulting in calling incorrect handlers. In the
current upstream kernel this is not a manifested bug, but heavily patched
kernels (such as those using the PaX patch series) did encounter this bug.

Signed-off-by: Kees Cook
Reported-by: PaX Team
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Yinghai Lu
Cc: Seiji Aguchi
Cc: Fenghua Yu
Signed-off-by: Greg Kroah-Hartman

Kees Cook
2013-08-04 16:50:53 +0800
6d7d28443 x86, suspend: Handle CPUs which fail to #GP on RDMSR ... Browse Code »

commit 5ff560fd48d5b3d82fa0c3aff625c9da1a301911 upstream.

There are CPUs which have errata causing RDMSR of a nonexistent MSR to
not fault. We would then try to WRMSR to restore the value of that
MSR, causing a crash. Specifically, some Pentium M variants would
have this problem trying to save and restore the non-existent EFER,
causing a crash on resume.

Work around this by making sure we can write back the result at
suspend time.

Huge thanks to Christian Sünkenberg for finding the offending erratum
that finally deciphered the mystery.

Reported-and-tested-by: Johan Heinrich
Debugged-by: Christian Sünkenberg
Acked-by: Rafael J. Wysocki
Link: http://lkml.kernel.org/r/51DDC972.3010005@student.kit.edu
Signed-off-by: Greg Kroah-Hartman

H. Peter Anvin
2013-08-04 16:50:53 +0800

22 Jul, 2013

2 commits

ac9349c42 xen/time: remove blocked time accounting from xen "clockchip" ... Browse Code »

commit 0b0c002c340e78173789f8afaa508070d838cf3d upstream.

... because the "clock_event_device framework" already accounts for idle
time through the "event_handler" function pointer in
xen_timer_interrupt().

The patch is intended as the completion of [1]. It should fix the double
idle times seen in PV guests' /proc/stat [2]. It should be orthogonal to
stolen time accounting (the removed code seems to be isolated).

The approach may be completely misguided.

[1] https://lkml.org/lkml/2011/10/6/10
[2] http://lists.xensource.com/archives/html/xen-devel/2010-08/msg01068.html

John took the time to retest this patch on top of v3.10 and reported:
"idle time is correctly incremented for pv and hvm for the normal
case, nohz=off and nohz=idle." so lets put this patch in.

Signed-off-by: Laszlo Ersek
Signed-off-by: John Haxby
Signed-off-by: Konrad Rzeszutek Wilk
Signed-off-by: Greg Kroah-Hartman

Laszlo Ersek
2013-07-22 09:21:27 +0800
587562a77 x86, efi: retry ExitBootServices() on failure ... Browse Code »

commit d3768d885c6ccbf8a137276843177d76c49033a7 upstream.

ExitBootServices is absolutely supposed to return a failure if any
ExitBootServices event handler changes the memory map. Basically the
get_map loop should run again if ExitBootServices returns an error the
first time. I would say it would be fair that if ExitBootServices gives
an error the second time then Linux would be fine in returning control
back to BIOS.

The second change is the following line:

again:
size += sizeof(*mem_map) * 2;

Originally you were incrementing it by the size of one memory map entry.
The issue here is all related to the low_alloc routine you are using.
In this routine you are making allocations to get the memory map itself.
Doing this allocation or allocations can affect the memory map by more
than one record.

[ mfleming - changelog, code style ]
Signed-off-by: Zach Bobroff
Signed-off-by: Matt Fleming
Signed-off-by: Greg Kroah-Hartman

Zach Bobroff
2013-07-22 09:21:27 +0800

14 Jul, 2013

1 commit

096bff239 KVM: VMX: mark unusable segment as nonpresent ... Browse Code »

commit 03617c188f41eeeb4223c919ee7e66e5a114f2c6 upstream.

Some userspaces do not preserve unusable property. Since usable
segment has to be present according to VMX spec we can use present
property to amend userspace bug by making unusable segment always
nonpresent. vmx_segment_access_rights() already marks nonpresent segment
as unusable.

Reported-by: Stefan Pietsch
Tested-by: Stefan Pietsch
Signed-off-by: Gleb Natapov
Signed-off-by: Paolo Bonzini
Signed-off-by: Greg Kroah-Hartman

Gleb Natapov
2013-07-14 02:42:27 +0800

27 Jun, 2013

1 commit

54faf77d0 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Three small fixlets"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hw_breakpoint: Use cpu_possible_mask in {reserve,release}_bp_slot()
hw_breakpoint: Fix cpu check in task_bp_pinned(cpu)
kprobes: Fix arch_prepare_kprobe to handle copy insn failures

Linus Torvalds
2013-06-27 02:51:44 +0800

23 Jun, 2013

1 commit

b8ff768b5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs fixes from Al Viro:
"Several fixes for bugs caught while looking through f_pos (ab)users"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aout32 coredump compat fix
splice: don't pass the address of ->f_pos to methods
mconsole: we'd better initialize pos before passing it to vfs_read()...

Linus Torvalds
2013-06-23 02:42:20 +0800

22 Jun, 2013

4 commits

945fb136d aout32 coredump compat fix ... Browse Code »

dump_seek() does SEEK_CUR, not SEEK_SET; native binfmt_aout
handles it correctly (seeks by PAGE_SIZE - sizeof(struct user),
getting the current position to PAGE_SIZE), compat one seeks
by PAGE_SIZE and ends up at PAGE_SIZE + already written...

Signed-off-by: Al Viro

Al Viro
2013-06-22 15:01:38 +0800
f71194a7d Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Peter Anvin:
"This series fixes a couple of build failures, and fixes MTRR cleanup
and memory setup on very specific memory maps.

Finally, it fixes triggering backtraces on all CPUs, which was
inadvertently disabled on x86."

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Fix dummy variable buffer allocation
x86: Fix trigger_all_cpu_backtrace() implementation
x86: Fix section mismatch on load_ucode_ap
x86: fix build error and kconfig for ia32_emulation and binfmt
range: Do not add new blank slot with add_range_with_merge
x86, mtrr: Fix original mtrr range get for mtrr_cleanup

Linus Torvalds
2013-06-22 00:33:48 +0800
9d0be540d Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM fixes from Paolo Bonzini:
"Three one-line fixes for my first pull request; one for x86 host, one
for x86 guest, one for PPC"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
x86: kvmclock: zero initialize pvclock shared memory area
kvm/ppc/booke: Delay kvmppc_lazy_ee_enable
KVM: x86: remove vcpu's CPL check in host-invoked XCR set

Linus Torvalds
2013-06-22 00:29:22 +0800
92616ee65 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 ... Browse Code »

Pull crypto fix from Herbert Xu:
"This fixes an unaligned crash in XTS mode when using aseni_intel"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: aesni_intel - fix accessing of unaligned memory

Linus Torvalds
2013-06-22 00:28:39 +0800

21 Jun, 2013

5 commits

df91c3513 Merge tag 'efi-urgent' into x86/urgent ... Browse Code »

* Don't leak random kernel memory to EFI variable NVRAM when attempting
to initiate garbage collection. Also, free the kernel memory when
we're done with it instead of leaking - Ben Hutchings

Signed-off-by: H. Peter Anvin

H. Peter Anvin
2013-06-21 18:01:21 +0800
b8cb62f82 x86/efi: Fix dummy variable buffer allocation ... Browse Code »

1. Check for allocation failure
2. Clear the buffer contents, as they may actually be written to flash
3. Don't leak the buffer

Compile-tested only.

[ Tested successfully on my buggy ASUS machine - Matt ]

Signed-off-by: Ben Hutchings
Cc: stable@vger.kernel.org
Signed-off-by: Matt Fleming

Ben Hutchings
2013-06-21 17:52:49 +0800
a3d5c3460 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fixes from Ingo Molnar:
"Two smaller fixes - plus a context tracking tracing fix that is a bit
bigger"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tracing/context-tracking: Add preempt_schedule_context() for tracing
sched: Fix clear NOHZ_BALANCE_KICK
sched/x86: Construct all sibling maps if smt

Linus Torvalds
2013-06-21 02:18:35 +0800
86c76676c Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Four fixes. The mmap ones are unfortunately larger than desired -
fuzzing uncovered bugs that needed perf context life time management
changes to fix properly"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP
perf: Fix mmap() accounting hole
perf: Fix perf mmap bugs
kprobes: Fix to free gone and unused optprobes

Linus Torvalds
2013-06-21 02:17:36 +0800
805e31854 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull cpu idle fixes from Thomas Gleixner:
- Add a missing irq enable. Fallout of the idle conversion
- Fix stackprotector wreckage caused by the idle conversion

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
idle: Enable interrupts in the weak arch_cpu_idle() implementation
idle: Add the stack canary init to cpu_startup_entry()

Linus Torvalds
2013-06-21 02:16:07 +0800

20 Jun, 2013

2 commits

003002e04 kprobes: Fix arch_prepare_kprobe to handle copy insn failures ... Browse Code »

Fix arch_prepare_kprobe() to handle failures in copy instruction
correctly. This fix is related to the previous fix: 8101376
which made __copy_instruction return an error result if failed,
but caller site was not updated to handle it. Thus, this is the
other half of the bugfix.

This fix is also related to the following bug-report:

https://bugzilla.redhat.com/show_bug.cgi?id=910649

Signed-off-by: Masami Hiramatsu
Acked-by: Steven Rostedt
Tested-by: Jonathan Lebon
Cc: Frank Ch. Eigler
Cc: systemtap@sourceware.org
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20130605031216.15285.2001.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar

Masami Hiramatsu
2013-06-20 20:25:48 +0800
b52e0a7c4 x86: Fix trigger_all_cpu_backtrace() implementation ... Browse Code »

The following change fixes the x86 implementation of
trigger_all_cpu_backtrace(), which was previously (accidentally,
as far as I can tell) disabled to always return false as on
architectures that do not implement this function.

trigger_all_cpu_backtrace(), as defined in include/linux/nmi.h,
should call arch_trigger_all_cpu_backtrace() if available, or
return false if the underlying arch doesn't implement this
function.

x86 did provide a suitable arch_trigger_all_cpu_backtrace()
implementation, but it wasn't actually being used because it was
declared in asm/nmi.h, which linux/nmi.h doesn't include. Also,
linux/nmi.h couldn't easily be fixed by including asm/nmi.h,
because that file is not available on all architectures.

I am proposing to fix this by moving the x86 definition of
arch_trigger_all_cpu_backtrace() to asm/irq.h.

Tested via: echo l > /proc/sysrq-trigger

Before the change, this uses a fallback implementation which
shows backtraces on active CPUs (using
smp_call_function_interrupt() )

After the change, this shows NMI backtraces on all CPUs

Signed-off-by: Michel Lespinasse
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/1370518875-1346-1-git-send-email-walken@google.com
Signed-off-by: Ingo Molnar

Michel Lespinasse
2013-06-20 20:00:21 +0800