Eric Lee / smarc-fsl-linux-kernel

01 Mar, 2012

8 commits

03fedc5c5 x86/amd: Fix L1i and L2 cache sharing information for AMD family 15h processors ... Browse Code »

commit 32c3233885eb10ac9cb9410f2f8cd64b8df2b2a1 upstream.

For L1 instruction cache and L2 cache the shared CPU information
is wrong. On current AMD family 15h CPUs those caches are shared
between both cores of a compute unit.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=42607

Signed-off-by: Andreas Herrmann
Cc: Petkov Borislav
Cc: Dave Jones
Link: http://lkml.kernel.org/r/20120208195229.GA17523@alberich.amd.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Andreas Herrmann
2012-03-01 08:31:14 +0800
758e4d3da ARM: omap: fix oops in arch/arm/mach-omap2/vp.c when pmic is not found ... Browse Code »

commit d980e0f8d858c6963d676013e976ff00ab7acb2b upstream.

When the PMIC is not found, voltdm->pmic will be NULL. vp.c's
initialization function tries to dereferences this, which causes an
oops:

Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT
Modules linked in:
CPU: 0 Not tainted (3.3.0-rc2+ #204)
PC is at omap_vp_init+0x5c/0x15c
LR is at omap_vp_init+0x58/0x15c
pc : [] lr : [] psr: 60000013
sp : c181ff30 ip : c181ff68 fp : c181ff64
r10: c0407808 r9 : c040786c r8 : c0407814
r7 : c0026868 r6 : c00264fc r5 : c040ad6c r4 : 00000000
r3 : 00000040 r2 : 000032c8 r1 : 0000fa00 r0 : 000032c8
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
Control: 10c5387d Table: 80004019 DAC: 00000015
Process swapper (pid: 1, stack limit = 0xc181e2e8)
Stack: (0xc181ff30 to 0xc1820000)
ff20: c0381d00 c02e9c6d c0383582 c040786c
ff40: c040ad6c c00264fc c0026868 c0407814 00000000 c03d9de4 c181ff8c c181ff68
ff60: c03db448 c03db830 c02e982c c03fdfb8 c03fe004 c0039988 00000013 00000000
ff80: c181ff9c c181ff90 c03d9df8 c03db390 c181ffdc c181ffa0 c0008798 c03d9df0
ffa0: c181ffc4 c181ffb0 c0055a44 c0187050 c0039988 c03fdfb8 c03fe004 c0039988
ffc0: 00000013 00000000 00000000 00000000 c181fff4 c181ffe0 c03d1284 c0008708
ffe0: 00000000 c03d1208 00000000 c181fff8 c0039988 c03d1214 1077ce40 01f7ee08
Backtrace:
[] (omap_vp_init+0x0/0x15c) from [] (omap_voltage_late_init+0xc4/0xfc)
[] (omap_voltage_late_init+0x0/0xfc) from [] (omap2_common_pm_late_init+0x14/0x54)
r8:00000000 r7:00000013 r6:c0039988 r5:c03fe004 r4:c03fdfb8
[] (omap2_common_pm_late_init+0x0/0x54) from [] (do_one_initcall+0x9c/0x164)
[] (do_one_initcall+0x0/0x164) from [] (kernel_init+0x7c/0x120)
[] (kernel_init+0x0/0x120) from [] (do_exit+0x0/0x2cc)
r5:c03d1208 r4:00000000
Code: e5ca300b e5900034 ebf69027 e5994024 (e5941000)
---[ end trace aed617dddaf32c3d ]---
Kernel panic - not syncing: Attempted to kill init!

Signed-off-by: Russell King
Cc: Igor Grinberg
Signed-off-by: Greg Kroah-Hartman

Russell King
2012-03-01 08:31:13 +0800
ceb484992 ARM: 7325/1: fix v7 boot with lockdep enabled ... Browse Code »

commit 8e43a905dd574f54c5715d978318290ceafbe275 upstream.

Bootup with lockdep enabled has been broken on v7 since b46c0f74657d
("ARM: 7321/1: cache-v7: Disable preemption when reading CCSIDR").

This is because v7_setup (which is called very early during boot) calls
v7_flush_dcache_all, and the save_and_disable_irqs added by that patch
ends up attempting to call into lockdep C code (trace_hardirqs_off())
when we are in no position to execute it (no stack, MMU off).

Fix this by using a notrace variant of save_and_disable_irqs. The code
already uses the notrace variant of restore_irqs.

Reviewed-by: Nicolas Pitre
Acked-by: Stephen Boyd
Cc: Catalin Marinas
Signed-off-by: Rabin Vincent
Signed-off-by: Russell King
Signed-off-by: Greg Kroah-Hartman

Rabin Vincent
2012-03-01 08:30:57 +0800
1bbe8912e ARM: 7321/1: cache-v7: Disable preemption when reading CCSIDR ... Browse Code »

commit b46c0f74657d1fe1c1b0c1452631cc38a9e6987f upstream.

armv7's flush_cache_all() flushes caches via set/way. To
determine the cache attributes (line size, number of sets,
etc.) the assembly first writes the CSSELR register to select a
cache level and then reads the CCSIDR register. The CSSELR register
is banked per-cpu and is used to determine which cache level CCSIDR
reads. If the task is migrated between when the CSSELR is written and
the CCSIDR is read the CCSIDR value may be for an unexpected cache
level (for example L1 instead of L2) and incorrect cache flushing
could occur.

Disable interrupts across the write and read so that the correct
cache attributes are read and used for the cache flushing
routine. We disable interrupts instead of disabling preemption
because the critical section is only 3 instructions and we want
to call v7_dcache_flush_all from __v7_setup which doesn't have a
full kernel stack with a struct thread_info.

This fixes a problem we see in scm_call() when flush_cache_all()
is called from preemptible context and sometimes the L2 cache is
not properly flushed out.

Signed-off-by: Stephen Boyd
Acked-by: Catalin Marinas
Reviewed-by: Nicolas Pitre
Signed-off-by: Russell King
Signed-off-by: Greg Kroah-Hartman

Stephen Boyd
2012-03-01 08:30:56 +0800
32818d15f ARM: 7326/2: PL330: fix null pointer dereference in pl330_chan_ctrl() ... Browse Code »

commit 46e33c606af8e0caeeca374103189663d877c0d6 upstream.

This fixes the thrd->req_running field being accessed before thrd
is checked for null. The error was introduced in

abb959f: ARM: 7237/1: PL330: Fix driver freeze

Reference:

Signed-off-by: Mans Rullgard
Acked-by: Javi Merino
Signed-off-by: Russell King
Signed-off-by: Greg Kroah-Hartman

Javi Merino
2012-03-01 08:30:53 +0800
cadd96ffc S390: correct ktime to tod clock comparator conversion ... Browse Code »

commit cf1eb40f8f5ea12c9e569e7282161fc7f194fd62 upstream.

The conversion of the ktime to a value suitable for the clock comparator
does not take changes to wall_to_monotonic into account. In fact the
conversion just needs the boot clock (sched_clock_base_cc) and the
total_sleep_time.

This is applicable to 3.2+ kernels.

Signed-off-by: Martin Schwidefsky
Signed-off-by: Greg Kroah-Hartman

Martin Schwidefsky
2012-03-01 08:30:53 +0800
4f89dcb18 ARM: at91: USB AT91 gadget registration for module ... Browse Code »

commit e8c9dc93e27d891636defbc269f182a83e6abba8 upstream.

Registration of at91_udc as a module will enable SoC
related code.

Fix following an idea from Karel Znamenacek.

Signed-off-by: Nicolas Ferre
Acked-by: Karel Znamenacek
Acked-by: Jean-Christophe PLAGNIOL-VILLARD
Signed-off-by: Greg Kroah-Hartman

Nicolas Ferre
2012-03-01 08:30:49 +0800
90f91ae15 powerpc/perf: power_pmu_start restores incorrect values, breaking frequency events ... Browse Code »

commit 9a45a9407c69d068500923480884661e2b9cc421 upstream.

perf on POWER stopped working after commit e050e3f0a71b (perf: Fix
broken interrupt rate throttling). That patch exposed a bug in
the POWER perf_events code.

Since the PMCs count upwards and take an exception when the top bit
is set, we want to write 0x80000000 - left in power_pmu_start. We were
instead programming in left which effectively disables the counter
until we eventually hit 0x80000000. This could take seconds or longer.

With the patch applied I get the expected number of samples:

SAMPLE events: 9948

Signed-off-by: Anton Blanchard
Acked-by: Paul Mackerras
Signed-off-by: Benjamin Herrenschmidt
Signed-off-by: Greg Kroah-Hartman

Anton Blanchard
2012-03-01 08:30:49 +0800

28 Feb, 2012

10 commits

9016ec427 i387: re-introduce FPU state preloading at context switch time ... Browse Code »

commit 34ddc81a230b15c0e345b6b253049db731499f7e upstream.

After all the FPU state cleanups and finally finding the problem that
caused all our FPU save/restore problems, this re-introduces the
preloading of FPU state that was removed in commit b3b0870ef3ff ("i387:
do not preload FPU state at task switch time").

However, instead of simply reverting the removal, this reimplements
preloading with several fixes, most notably

- properly abstracted as a true FPU state switch, rather than as
open-coded save and restore with various hacks.

In particular, implementing it as a proper FPU state switch allows us
to optimize the CR0.TS flag accesses: there is no reason to set the
TS bit only to then almost immediately clear it again. CR0 accesses
are quite slow and expensive, don't flip the bit back and forth for
no good reason.

- Make sure that the same model works for both x86-32 and x86-64, so
that there are no gratuitous differences between the two due to the
way they save and restore segment state differently due to
architectural differences that really don't matter to the FPU state.

- Avoid exposing the "preload" state to the context switch routines,
and in particular allow the concept of lazy state restore: if nothing
else has used the FPU in the meantime, and the process is still on
the same CPU, we can avoid restoring state from memory entirely, just
re-expose the state that is still in the FPU unit.

That optimized lazy restore isn't actually implemented here, but the
infrastructure is set up for it. Of course, older CPU's that use
'fnsave' to save the state cannot take advantage of this, since the
state saving also trashes the state.

In other words, there is now an actual _design_ to the FPU state saving,
rather than just random historical baggage. Hopefully it's easier to
follow as a result.

Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Linus Torvalds
2012-02-28 02:25:55 +0800
555558c5b i387: move TS_USEDFPU flag from thread_info to task_struct ... Browse Code »

commit f94edacf998516ac9d849f7bc6949a703977a7f3 upstream.

This moves the bit that indicates whether a thread has ownership of the
FPU from the TS_USEDFPU bit in thread_info->status to a word of its own
(called 'has_fpu') in task_struct->thread.has_fpu.

This fixes two independent bugs at the same time:

- changing 'thread_info->status' from the scheduler causes nasty
problems for the other users of that variable, since it is defined to
be thread-synchronous (that's what the "TS_" part of the naming was
supposed to indicate).

So perfectly valid code could (and did) do

ti->status |= TS_RESTORE_SIGMASK;

and the compiler was free to do that as separate load, or and store
instructions. Which can cause problems with preemption, since a task
switch could happen in between, and change the TS_USEDFPU bit. The
change to TS_USEDFPU would be overwritten by the final store.

In practice, this seldom happened, though, because the 'status' field
was seldom used more than once, so gcc would generally tend to
generate code that used a read-modify-write instruction and thus
happened to avoid this problem - RMW instructions are naturally low
fat and preemption-safe.

- On x86-32, the current_thread_info() pointer would, during interrupts
and softirqs, point to a *copy* of the real thread_info, because
x86-32 uses %esp to calculate the thread_info address, and thus the
separate irq (and softirq) stacks would cause these kinds of odd
thread_info copy aliases.

This is normally not a problem, since interrupts aren't supposed to
look at thread information anyway (what thread is running at
interrupt time really isn't very well-defined), but it confused the
heck out of irq_fpu_usable() and the code that tried to squirrel
away the FPU state.

(It also caused untold confusion for us poor kernel developers).

It also turns out that using 'task_struct' is actually much more natural
for most of the call sites that care about the FPU state, since they
tend to work with the task struct for other reasons anyway (ie
scheduling). And the FPU data that we are going to save/restore is
found there too.

Thanks to Arjan Van De Ven for pointing us to
the %esp issue.

Cc: Arjan van de Ven
Reported-and-tested-by: Raphael Prevost
Acked-and-tested-by: Suresh Siddha
Tested-by: Peter Anvin
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Linus Torvalds
2012-02-28 02:25:54 +0800
9147fbe60 i387: move AMD K7/K8 fpu fxsave/fxrstor workaround from save to restore ... Browse Code »

commit 4903062b5485f0e2c286a23b44c9b59d9b017d53 upstream.

The AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception is
pending. In order to not leak FIP state from one process to another, we
need to do a floating point load after the fxsave of the old process,
and before the fxrstor of the new FPU state. That resets the state to
the (uninteresting) kernel load, rather than some potentially sensitive
user information.

We used to do this directly after the FPU state save, but that is
actually very inconvenient, since it

(a) corrupts what is potentially perfectly good FPU state that we might
want to lazy avoid restoring later and

(b) on x86-64 it resulted in a very annoying ordering constraint, where
"__unlazy_fpu()" in the task switch needs to be delayed until after
the DS segment has been reloaded just to get the new DS value.

Coupling it to the fxrstor instead of the fxsave automatically avoids
both of these issues, and also ensures that we only do it when actually
necessary (the FP state after a save may never actually get used). It's
simply a much more natural place for the leaked state cleanup.

Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Linus Torvalds
2012-02-28 02:25:54 +0800
ba6aaed5c i387: do not preload FPU state at task switch time ... Browse Code »

commit b3b0870ef3ffed72b92415423da864f440f57ad6 upstream.

Yes, taking the trap to re-load the FPU/MMX state is expensive, but so
is spending several days looking for a bug in the state save/restore
code. And the preload code has some rather subtle interactions with
both paravirtualization support and segment state restore, so it's not
nearly as simple as it should be.

Also, now that we no longer necessarily depend on a single bit (ie
TS_USEDFPU) for keeping track of the state of the FPU, we migth be able
to do better. If we are really switching between two processes that
keep touching the FP state, save/restore is inevitable, but in the case
of having one process that does most of the FPU usage, we may actually
be able to do much better than the preloading.

In particular, we may be able to keep track of which CPU the process ran
on last, and also per CPU keep track of which process' FP state that CPU
has. For modern CPU's that don't destroy the FPU contents on save time,
that would allow us to do a lazy restore by just re-enabling the
existing FPU state - with no restore cost at all!

Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Linus Torvalds
2012-02-28 02:25:54 +0800
29515b215 i387: don't ever touch TS_USEDFPU directly, use helper functions ... Browse Code »

commit 6d59d7a9f5b723a7ac1925c136e93ec83c0c3043 upstream.

This creates three helper functions that do the TS_USEDFPU accesses, and
makes everybody that used to do it by hand use those helpers instead.

In addition, there's a couple of helper functions for the "change both
CR0.TS and TS_USEDFPU at the same time" case, and the places that do
that together have been changed to use those. That means that we have
fewer random places that open-code this situation.

The intent is partly to clarify the code without actually changing any
semantics yet (since we clearly still have some hard to reproduce bug in
this area), but also to make it much easier to use another approach
entirely to caching the CR0.TS bit for software accesses.

Right now we use a bit in the thread-info 'status' variable (this patch
does not change that), but we might want to make it a full field of its
own or even make it a per-cpu variable.

Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Linus Torvalds
2012-02-28 02:25:54 +0800
38358b618 i387: move TS_USEDFPU clearing out of __save_init_fpu and into callers ... Browse Code »

commit b6c66418dcad0fcf83cd1d0a39482db37bf4fc41 upstream.

Touching TS_USEDFPU without touching CR0.TS is confusing, so don't do
it. By moving it into the callers, we always do the TS_USEDFPU next to
the CR0.TS accesses in the source code, and it's much easier to see how
the two go hand in hand.

Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Linus Torvalds
2012-02-28 02:25:53 +0800
a5c287166 i387: fix x86-64 preemption-unsafe user stack save/restore ... Browse Code »

commit 15d8791cae75dca27bfda8ecfe87dca9379d6bb0 upstream.

Commit 5b1cbac37798 ("i387: make irq_fpu_usable() tests more robust")
added a sanity check to the #NM handler to verify that we never cause
the "Device Not Available" exception in kernel mode.

However, that check actually pinpointed a (fundamental) race where we do
cause that exception as part of the signal stack FPU state save/restore
code.

Because we use the floating point instructions themselves to save and
restore state directly from user mode, we cannot do that atomically with
testing the TS_USEDFPU bit: the user mode access itself may cause a page
fault, which causes a task switch, which saves and restores the FP/MMX
state from the kernel buffers.

This kind of "recursive" FP state save is fine per se, but it means that
when the signal stack save/restore gets restarted, it will now take the
'#NM' exception we originally tried to avoid. With preemption this can
happen even without the page fault - but because of the user access, we
cannot just disable preemption around the save/restore instruction.

There are various ways to solve this, including using the
"enable/disable_page_fault()" helpers to not allow page faults at all
during the sequence, and fall back to copying things by hand without the
use of the native FP state save/restore instructions.

However, the simplest thing to do is to just allow the #NM from kernel
space, but fix the race in setting and clearing CR0.TS that this all
exposed: the TS bit changes and the TS_USEDFPU bit absolutely have to be
atomic wrt scheduling, so while the actual state save/restore can be
interrupted and restarted, the act of actually clearing/setting CR0.TS
and the TS_USEDFPU bit together must not.

Instead of just adding random "preempt_disable/enable()" calls to what
is already excessively ugly code, this introduces some helper functions
that mostly mirror the "kernel_fpu_begin/end()" functionality, just for
the user state instead.

Those helper functions should probably eventually replace the other
ad-hoc CR0.TS and TS_USEDFPU tests too, but I'll need to think about it
some more: the task switching functionality in particular needs to
expose the difference between the 'prev' and 'next' threads, while the
new helper functions intentionally were written to only work with
'current'.

Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Linus Torvalds
2012-02-28 02:25:53 +0800
0a7ea9d5a i387: fix sense of sanity check ... Browse Code »

commit c38e23456278e967f094b08247ffc3711b1029b2 upstream.

The check for save_init_fpu() (introduced in commit 5b1cbac37798: "i387:
make irq_fpu_usable() tests more robust") was the wrong way around, but
I hadn't noticed, because my "tests" were bogus: the FPU exceptions are
disabled by default, so even doing a divide by zero never actually
triggers this code at all unless you do extra work to enable them.

So if anybody did enable them, they'd get one spurious warning.

Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Linus Torvalds
2012-02-28 02:25:53 +0800
42f2560ed i387: make irq_fpu_usable() tests more robust ... Browse Code »

commit 5b1cbac37798805c1fee18c8cebe5c0a13975b17 upstream.

Some code - especially the crypto layer - wants to use the x86
FP/MMX/AVX register set in what may be interrupt (typically softirq)
context.

That *can* be ok, but the tests for when it was ok were somewhat
suspect. We cannot touch the thread-specific status bits either, so
we'd better check that we're not going to try to save FP state or
anything like that.

Now, it may be that the TS bit is always cleared *before* we set the
USEDFPU bit (and only set when we had already cleared the USEDFP
before), so the TS bit test may actually have been sufficient, but it
certainly was not obviously so.

So this explicitly verifies that we will not touch the TS_USEDFPU bit,
and adds a few related sanity-checks. Because it seems that somehow
AES-NI is corrupting user FP state. The cause is not clear, and this
patch doesn't fix it, but while debugging it I really wanted the code to
be more obviously correct and robust.

Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Linus Torvalds
2012-02-28 02:25:53 +0800
4733009df i387: math_state_restore() isn't called from asm ... Browse Code »

commit be98c2cdb15ba26148cd2bd58a857d4f7759ed38 upstream.

It was marked asmlinkage for some really old and stale legacy reasons.
Fix that and the equally stale comment.

Noticed when debugging the irq_fpu_usable() bugs.

Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Linus Torvalds
2012-02-28 02:25:52 +0800

21 Feb, 2012

1 commit

3601bce60 xen pvhvm: do not remap pirqs onto evtchns if !xen_have_vector_callback ... Browse Code »

commit 207d543f472c1ac9552df79838dc807cbcaa9740 upstream.

Signed-off-by: Stefano Stabellini
Signed-off-by: Konrad Rzeszutek Wilk
Signed-off-by: Greg Kroah-Hartman

Stefano Stabellini
2012-02-21 04:46:20 +0800

14 Feb, 2012

4 commits

04712489f ARM: OMAP2+: GPMC: fix device size setup ... Browse Code »

commit 8ef5d844cc3a644ea6f7665932a4307e9fad01fa upstream.

following statement can only change device size from 8-bit(0) to 16-bit(1),
but not vice versa:

regval |= GPMC_CONFIG1_DEVICESIZE(wval);

so as this field has 1 reserved bit, that could be used in future,
just clear both bits and then OR with the desired value

Signed-off-by: Yegor Yefremov
Signed-off-by: Tony Lindgren
Signed-off-by: Greg Kroah-Hartman

Yegor Yefremov
2012-02-14 03:16:52 +0800
a4e4a6ee0 ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers ... Browse Code »

commit 8130b9d7b9d858aa04ce67805e8951e3cb6e9b2f upstream.

If we are context switched whilst copying into a thread's
vfp_hard_struct then the partial copy may be corrupted by the VFP
context switching code (see "ARM: vfp: flush thread hwstate before
restoring context from sigframe").

This patch updates the ptrace VFP set code so that the thread state is
flushed before the copy, therefore disabling VFP and preventing
corruption from occurring.

Signed-off-by: Will Deacon
Signed-off-by: Russell King
Signed-off-by: Greg Kroah-Hartman

Will Deacon
2012-02-14 03:16:52 +0800
c85ca4cdf ARM: 7307/1: vfp: fix ptrace regset modification race ... Browse Code »

commit 247f4993a5974e6759606c4d380748eecfd273ff upstream.

In a preemptible kernel, vfp_set() can be preempted, causing the
hardware VFP context to be switched while the thread vfp state is
being read and modified. This leads to a race condition which can
cause the thread vfp state to become corrupted if lazy VFP context
save occurs due to preemption in between the time thread->vfpstate
is read and the time the modified state is written back.

This may occur if preemption occurs during the execution of a
ptrace() call which modifies the VFP register state of a thread.
Such instances should be very rare in most realistic scenarios --
none has been reported, so far as I am aware. Only uniprocessor
systems should be affected, since VFP context save is not currently
lazy in SMP kernels.

The problem was introduced by my earlier patch migrating to use
regsets to implement ptrace.

This patch does a vfp_sync_hwstate() before reading
thread->vfpstate, to make sure that the thread's VFP state is not
live in the hardware registers while the registers are modified.

Thanks to Will Deacon for spotting this.

Signed-off-by: Dave Martin
Signed-off-by: Will Deacon
Signed-off-by: Russell King
Signed-off-by: Greg Kroah-Hartman

Dave Martin
2012-02-14 03:16:52 +0800
04c6e8a25 ARM: 7306/1: vfp: flush thread hwstate before restoring context from sigframe ... Browse Code »

commit 2af276dfb1722e97b190bd2e646b079a2aa674db upstream.

Following execution of a signal handler, we currently restore the VFP
context from the ucontext in the signal frame. This involves copying
from the user stack into the current thread's vfp_hard_struct and then
flushing the new data out to the hardware registers.

This is problematic when using a preemptible kernel because we could be
context switched whilst updating the vfp_hard_struct. If the current
thread has made use of VFP since the last context switch, the VFP
notifier will copy from the hardware registers into the vfp_hard_struct,
overwriting any data that had been partially copied by the signal code.

Disabling preemption across copy_from_user calls is a terrible idea, so
instead we move the VFP thread flush *before* we update the
vfp_hard_struct. Since the flushing is performed lazily, this has the
effect of disabling VFP and clearing the CPU's VFP state pointer,
therefore preventing the thread from being updated with stale data on
the next context switch.

Tested-by: Peter Maydell
Signed-off-by: Will Deacon
Signed-off-by: Russell King
Signed-off-by: Greg Kroah-Hartman

Will Deacon
2012-02-14 03:16:52 +0800

04 Feb, 2012

10 commits

7b171c58d mach-ux500: no MMC_CAP_SD_HIGHSPEED on Snowball ... Browse Code »

commit 2ab1159e80e8f416071e9f51e4f77b9173948296 upstream.

MMC_CAP_SD_HIGHSPEED is not supported on Snowball board resulting on
initialization errors.

Signed-off-by: Mathieu Poirier
Signed-off-by: Fredrik Soderstedt
Signed-off-by: Philippe Langlais
Signed-off-by: Linus Walleij

Philippe Langlais
2012-02-04 01:22:27 +0800
cb65f39fb net: bpf_jit: fix divide by 0 generation ... Browse Code »

[ Upstream commit d00a9dd21bdf7908b70866794c8313ee8a5abd5c ]

Several problems fixed in this patch :

1) Target of the conditional jump in case a divide by 0 is performed
by a bpf is wrong.

2) Must 'generate' the full function prologue/epilogue at pass=0,
or else we can stop too early in pass=1 if the proglen doesnt change.
(if the increase of prologue/epilogue equals decrease of all
instructions length because some jumps are converted to near jumps)

3) Change the wrong length detection at the end of code generation to
issue a more explicit message, no need for a full stack trace.

Reported-by: Phil Oester
Signed-off-by: Eric Dumazet
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Eric Dumazet
2012-02-04 01:22:20 +0800
34793f20e x86: xen: size struct xen_spinlock to always fit in arch_spinlock_t ... Browse Code »

commit 7a7546b377bdaa25ac77f33d9433c59f259b9688 upstream.

If NR_CPUS < 256 then arch_spinlock_t is only 16 bits wide but struct
xen_spinlock is 32 bits. When a spin lock is contended and
xl->spinners is modified the two bytes immediately after the spin lock
would be corrupted.

This is a regression caused by 84eb950db13ca40a0572ce9957e14723500943d6
(x86, ticketlock: Clean up types and accessors) which reduced the size
of arch_spinlock_t.

Fix this by making xl->spinners a u8 if NR_CPUS < 256. A
BUILD_BUG_ON() is also added to check the sizes of the two structures
are compatible.

In many cases this was not noticable as there would often be padding
bytes after the lock (e.g., if any of CONFIG_GENERIC_LOCKBREAK,
CONFIG_DEBUG_SPINLOCK, or CONFIG_DEBUG_LOCK_ALLOC were enabled).

The bnx2 driver is affected. In struct bnx2, phy_lock and
indirect_lock may have no padding after them. Contention on phy_lock
would corrupt indirect_lock making it appear locked and the driver
would deadlock.

Signed-off-by: David Vrabel
Signed-off-by: Jeremy Fitzhardinge
Acked-by: Ian Campbell
Signed-off-by: Konrad Rzeszutek Wilk
Signed-off-by: Greg Kroah-Hartman

David Vrabel
2012-02-04 01:21:51 +0800
33d812de4 ARM: 7296/1: proc-v7.S: remove HARVARD_CACHE preprocessor guards ... Browse Code »

commit 612539e81f655f6ac73c7af1da8701c1ee618aee upstream.

On v7, we use the same cache maintenance instructions for data lines
as for unified lines. This was not the case for v6, where HARVARD_CACHE
was defined to indicate the L1 cache topology.

This patch removes the erroneous compile-time check for HARVARD_CACHE in
proc-v7.S, ensuring that we perform I-side invalidation at boot.

Reported-and-Acked-by: Shawn Guo

Acked-by: Catalin Marinas
Signed-off-by: Will Deacon
Signed-off-by: Russell King
Signed-off-by: Greg Kroah-Hartman

Will Deacon
2012-02-04 01:21:46 +0800
61766889e mach-ux500: enable ARM errata 764369 ... Browse Code »

commit d65015f7c5c5be9fd3f5e567889c844ba81bdc9c upstream.

This applies ARM errata 764369 for all ux500 platforms.

Signed-off-by: Srinidhi Kasagar
Signed-off-by: Linus Walleij
Signed-off-by: Greg Kroah-Hartman

Srinidhi KASAGAR
2012-02-04 01:21:45 +0800
5e0ff93a2 ARM: at91: fix at91rm9200 soc subtype handling ... Browse Code »

commit 3e90772f76010c315474bde59eaca7cc4c94d645 upstream.

Currently setting it to PQFP changes subtype to BGA as subtypes are
swapped in at91rm9200_set_type().

Wrong subtype causes GPIO bank D not to work at all.

After this fix, subtype is still set as unknown. But board code should
fill it in with proper value. Another information is thus printed.

Bug discovery and first implementation made by Veli-Pekka Peltola.

Signed-off-by: Nicolas Ferre
Acked-by: Jean-Christophe PLAGNIOL-VILLARD
Signed-off-by: Greg Kroah-Hartman

Nicolas Ferre
2012-02-04 01:21:45 +0800
b1cd343b2 m68k: Fix assembler constraint to prevent overeager gcc optimisation ... Browse Code »

commit 2a3535069e33d8b416f406c159ce924427315303 upstream.

Passing the address of a variable as an operand to an asm statement
doesn't mark the value of this variable as used, so gcc may optimize its
initialisation away. Fix this by using the "m" constraint instead.

Signed-off-by: Andreas Schwab
Signed-off-by: Geert Uytterhoeven
Signed-off-by: Greg Kroah-Hartman

Andreas Schwab
2012-02-04 01:21:38 +0800
26e15e85d x86/microcode_amd: Add support for CPU family specific container files ... Browse Code »

commit 5b68edc91cdc972c46f76f85eded7ffddc3ff5c2 upstream.

We've decided to provide CPU family specific container files
(starting with CPU family 15h). E.g. for family 15h we have to
load microcode_amd_fam15h.bin instead of microcode_amd.bin

Rationale is that starting with family 15h patch size is larger
than 2KB which was hard coded as maximum patch size in various
microcode loaders (not just Linux).

Container files which include patches larger than 2KB cause
different kinds of trouble with such old patch loaders. Thus we
have to ensure that the default container file provides only
patches with size less than 2KB.

Signed-off-by: Andreas Herrmann
Cc: Borislav Petkov
Cc:
Link: http://lkml.kernel.org/r/20120120164412.GD24508@alberich.amd.com
[ documented the naming convention and tidied the code a bit. ]
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Andreas Herrmann
2012-02-04 01:21:38 +0800
5d9b4b40f x86/uv: Fix uv_gpa_to_soc_phys_ram() shift ... Browse Code »

commit 5a51467b146ab7948d2f6812892eac120a30529c upstream.

uv_gpa_to_soc_phys_ram() was inadvertently ignoring the
shift values. This fix takes the shift into account.

Signed-off-by: Russ Anderson
Link: http://lkml.kernel.org/r/20120119020753.GA7228@sgi.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Russ Anderson
2012-02-04 01:21:36 +0800
c7e4489fb x86/uv: Fix uninitialized spinlocks ... Browse Code »

commit d2ebc71d472020bc30e29afe8c4d2a85a5b41f56 upstream.

Initialize two spinlocks in tlb_uv.c and also properly define/initialize
the uv_irq_lock.

The lack of explicit initialization seems to be functionally
harmless, but it is diagnosed when these are turned on:

CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y

Signed-off-by: Cliff Wickman
Cc: Dimitri Sivanich
Link: http://lkml.kernel.org/r/E1RnXd1-0003wU-PM@eag09.americas.sgi.com
[ Added the uv_irq_lock initialization fix by Dimitri Sivanich ]
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Cliff Wickman
2012-02-04 01:21:35 +0800

26 Jan, 2012

7 commits

5f4972712 score: fix off-by-one index into syscall table ... Browse Code »

commit c25a785d6647984505fa165b5cd84cfc9a95970b upstream.

If the provided system call number is equal to __NR_syscalls, the
current check will pass and a function pointer just after the system
call table may be called, since sys_call_table is an array with total
size __NR_syscalls.

Whether or not this is a security bug depends on what the compiler puts
immediately after the system call table. It's likely that this won't do
anything bad because there is an additional NULL check on the syscall
entry, but if there happens to be a non-NULL value immediately after the
system call table, this may result in local privilege escalation.

Signed-off-by: Dan Rosenberg
Cc: Chen Liqin
Cc: Lennox Wu
Cc: Eugene Teo
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Dan Rosenberg
2012-01-26 08:13:56 +0800
ebd11e15a x86/UV2: Work around BAU bug ... Browse Code »

commit c5d35d399e685acccc85a675e8765c26b2a9813a upstream.

This patch implements a workaround for a UV2 hardware bug.
The bug is a non-atomic update of a memory-mapped register. When
hardware message delivery and software message acknowledge occur
simultaneously the pending message acknowledge for the arriving
message may be lost. This causes the sender's message status to
stay busy.

Part of the workaround is to not acknowledge a completed message
until it is verified that no other message is actually using the
resource that is mistakenly recorded in the completed message.

Part of the workaround is to test for long elapsed time in such
a busy condition, then handle it by using a spare sending
descriptor. The stay-busy condition is eventually timed out by
hardware, and then the original sending descriptor can be
re-used. Most of that logic change is in keeping track of the
current descriptor and the state of the spares.

The occurrences of the workaround are added to the BAU
statistics.

Signed-off-by: Cliff Wickman
Link: http://lkml.kernel.org/r/20120116211947.GC5767@sgi.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Cliff Wickman
2012-01-26 08:13:53 +0800
5869dc3cc x86/UV2: Fix BAU destination timeout initialization ... Browse Code »

commit d059f9fa84a30e04279c6ff615e9e2cf3b260191 upstream.

Move the call to enable_timeouts() forward so that
BAU_MISC_CONTROL is initialized before using it in
calculate_destination_timeout().

Fix the calculation of a BAU destination timeout
for UV2 (in calculate_destination_timeout()).

Signed-off-by: Cliff Wickman
Link: http://lkml.kernel.org/r/20120116211848.GB5767@sgi.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Cliff Wickman
2012-01-26 08:13:53 +0800
404340940 x86/UV2: Fix new UV2 hardware by using native UV2 broadcast mode ... Browse Code »

commit da87c937e5a2374686edd58df06cfd5050b125fa upstream.

Update the use of the Broadcast Assist Unit on SGI Altix UV2 to
the use of native UV2 mode on new hardware (not the legacy mode).

UV2 native mode has a different format for a broadcast message.
We also need quick differentiaton between UV1 and UV2.

Signed-off-by: Cliff Wickman
Link: http://lkml.kernel.org/r/20120116211750.GA5767@sgi.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Cliff Wickman
2012-01-26 08:13:53 +0800
a591555d5 ACPI, ia64: Use SRAT table rev to use 8bit or 16/32bit PXM fields (ia64) ... Browse Code »

commit 9f10f6a520deb3639fac78d81151a3ade88b4e7f upstream.

In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
32bits for these. The new fields were reserved before.
According to the ACPI spec, the OS must disregrard reserved fields.

ia64 did handle the PXM fields almost consistently, but depending on
sgi's sn2 platform. This patch leaves the sn2 logic in, but does also
use 16/32 bits for PXM if the SRAT has rev 2 or higher.

The patch also adds __init to the two pxm accessor functions, as they
access __initdata now and are called from an __init function only anyway.

Note that the code only uses 16 bits for the PXM field in the processor
proximity field; the patch does not address this as 16 bits are more than
enough.

Signed-off-by: Kurt Garloff
Signed-off-by: Len Brown
Signed-off-by: Greg Kroah-Hartman

Kurt Garloff
2012-01-26 08:13:47 +0800
f31832299 ACPI, x86: Use SRAT table rev to use 8bit or 32bit PXM fields (x86/x86-64) ... Browse Code »

commit cd298f60a2451a16e0f077404bf69b62ec868733 upstream.

In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
32bits for these. The new fields were reserved before.
According to the ACPI spec, the OS must disregrard reserved fields.

x86/x86-64 was rather inconsistent prior to this patch; it used 8 bits
for the pxm field in cpu_affinity, but 32 bits in mem_affinity.
This patch makes it consistent: Either use 8 bits consistently (SRAT
rev 1 or lower) or 32 bits (SRAT rev 2 or higher).

cc: x86@kernel.org
Signed-off-by: Kurt Garloff
Signed-off-by: Len Brown
Signed-off-by: Greg Kroah-Hartman

Kurt Garloff
2012-01-26 08:13:47 +0800
4f62b2e51 x86, UV: Update Boot messages for SGI UV2 platform ... Browse Code »

commit da517a08ac5913cd80ce3507cddd00f2a091b13c upstream.

SGI UV systems print a message during boot:

UV: Found blades

Due to packaging changes, the blade count is not accurate for
on the next generation of the platform. This patch corrects the
count.

Signed-off-by: Jack Steiner
Link: http://lkml.kernel.org/r/20120106191900.GA19772@sgi.com
Signed-off-by: Ingo Molnar
Signed-off-by: Greg Kroah-Hartman

Jack Steiner
2012-01-26 08:13:33 +0800