Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

31 Oct, 2014

40 commits

d0335e4fe Linux 3.16.7 Browse Code »

Greg Kroah-Hartman
2014-10-31 00:41:01 +0800
2a545829b sparc64: Implement __get_user_pages_fast(). ... Browse Code »

[ Upstream commit 06090e8ed89ea2113a236befb41f71d51f100e60 ]

It is not sufficient to only implement get_user_pages_fast(), you
must also implement the atomic version __get_user_pages_fast()
otherwise you end up using the weak symbol fallback implementation
which simply returns zero.

This is dangerous, because it causes the futex code to loop forever
if transparent hugepages are supported (see get_futex_key()).

Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

David S. Miller
2014-10-31 00:40:21 +0800
e81ef812f sparc64: Fix register corruption in top-most kernel stack frame during boot. ... Browse Code »

[ Upstream commit ef3e035c3a9b81da8a778bc333d10637acf6c199 ]

Meelis Roos reported that kernels built with gcc-4.9 do not boot, we
eventually narrowed this down to only impacting machines using
UltraSPARC-III and derivitive cpus.

The crash happens right when the first user process is spawned:

[ 54.451346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
[ 54.451346]
[ 54.571516] CPU: 1 PID: 1 Comm: init Not tainted 3.16.0-rc2-00211-gd7933ab #96
[ 54.666431] Call Trace:
[ 54.698453] [0000000000762f8c] panic+0xb0/0x224
[ 54.759071] [000000000045cf68] do_exit+0x948/0x960
[ 54.823123] [000000000042cbc0] fault_in_user_windows+0xe0/0x100
[ 54.902036] [0000000000404ad0] __handle_user_windows+0x0/0x10
[ 54.978662] Press Stop-A (L1-A) to return to the boot prom
[ 55.050713] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004

Further investigation showed that compiling only per_cpu_patch() with
an older compiler fixes the boot.

Detailed analysis showed that the function is not being miscompiled by
gcc-4.9, but it is using a different register allocation ordering.

With the gcc-4.9 compiled function, something during the code patching
causes some of the %i* input registers to get corrupted. Perhaps
we have a TLB miss path into the firmware that is deep enough to
cause a register window spill and subsequent restore when we get
back from the TLB miss trap.

Let's plug this up by doing two things:

1) Stop using the firmware stack for client interface calls into
the firmware. Just use the kernel's stack.

2) As soon as we can, call into a new function "start_early_boot()"
to put a one-register-window buffer between the firmware's
deepest stack frame and the top-most initial kernel one.

Reported-by: Meelis Roos
Tested-by: Meelis Roos
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

David S. Miller
2014-10-31 00:40:21 +0800
5955d6d11 sparc64: Increase size of boot string to 1024 bytes ... Browse Code »

[ Upstream commit 1cef94c36bd4d79b5ae3a3df99ee0d76d6a4a6dc ]

This is the longest boot string that silo supports.

Signed-off-by: Dave Kleikamp
Cc: Bob Picco
Cc: David S. Miller
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Dave Kleikamp
2014-10-31 00:40:21 +0800
4fe9ef523 sparc64: Kill unnecessary tables and increase MAX_BANKS. ... Browse Code »

[ Upstream commit d195b71bad4347d2df51072a537f922546a904f1 ]

swapper_low_pmd_dir and swapper_pud_dir are actually completely
useless and unnecessary.

We just need swapper_pg_dir[]. Naturally the other page table chunks
will be allocated on an as-needed basis. Since the kernel actually
accesses these tables in the PAGE_OFFSET view, there is not even a TLB
locality advantage of placing them in the kernel image.

Use the hard coded vmlinux.ld.S slot for swapper_pg_dir which is
naturally page aligned.

Increase MAX_BANKS to 1024 in order to handle heavily fragmented
virtual guests.

Even with this MAX_BANKS increase, the kernel is 20K+ smaller.

Signed-off-by: David S. Miller
Acked-by: Bob Picco
Signed-off-by: Greg Kroah-Hartman

David S. Miller
2014-10-31 00:40:21 +0800
6720e85bb sparc64: sparse irq ... Browse Code »

[ Upstream commit ee6a9333fa58e11577c1b531b8e0f5ffc0fd6f50 ]

This patch attempts to do a few things. The highlights are: 1) enable
SPARSE_IRQ unconditionally, 2) kills off !SPARSE_IRQ code 3) allocates
ivector_table at boot time and 4) default to cookie only VIRQ mechanism
for supported firmware. The first firmware with cookie only support for
me appears on T5. You can optionally force the HV firmware to not cookie
only mode which is the sysino support.

The sysino is a deprecated HV mechanism according to the most recent
SPARC Virtual Machine Specification. HV_GRP_INTR is what controls the
cookie/sysino firmware versioning.

The history of this interface is:

1) Major version 1.0 only supported sysino based interrupt interfaces.

2) Major version 2.0 added cookie based VIRQs, however due to the fact
that OSs were using the VIRQs without negoatiating major version
2.0 (Linux and Solaris are both guilty), the VIRQs calls were
allowed even with major version 1.0

To complicate things even further, the VIRQ interfaces were only
actually hooked up in the hypervisor for LDC interrupt sources.
VIRQ calls on other device types would result in HV_EINVAL errors.

So effectively, major version 2.0 is unusable.

3) Major version 3.0 was created to signal use of VIRQs and the fact
that the hypervisor has these calls hooked up for all interrupt
sources, not just those for LDC devices.

A new boot option is provided should cookie only HV support have issues.
hvirq - this is the version for HV_GRP_INTR. This is related to HV API
versioning. The code attempts major=3 first by default. The option can
be used to override this default.

I've tested with SPARSE_IRQ on T5-8, M7-4 and T4-X and Jalap?no.

Signed-off-by: Bob Picco
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

bob picco
2014-10-31 00:40:21 +0800
4e7657515 sparc64: Adjust vmalloc region size based upon available virtual address bits. ... Browse Code »

[ Upstream commit bb4e6e85daa52a9f6210fa06a5ec6269598a202b ]

In order to accomodate embedded per-cpu allocation with large numbers
of cpus and numa nodes, we have to use as much virtual address space
as possible for the vmalloc region. Otherwise we can get things like:

PERCPU: max_distance=0x380001c10000 too large for vmalloc space 0xff00000000

So, once we select a value for PAGE_OFFSET, derive the size of the
vmalloc region based upon that.

Signed-off-by: David S. Miller
Acked-by: Bob Picco
Signed-off-by: Greg Kroah-Hartman

David S. Miller
2014-10-31 00:40:21 +0800
539fe5fa0 sparc64: Increase MAX_PHYS_ADDRESS_BITS to 53. ... Browse Code »

Make sure, at compile time, that the kernel can properly support
whatever MAX_PHYS_ADDRESS_BITS is defined to.

On M7 chips, use a max_phys_bits value of 49.

Based upon a patch by Bob Picco.

Signed-off-by: David S. Miller
Acked-by: Bob Picco
Signed-off-by: Greg Kroah-Hartman

David S. Miller
2014-10-31 00:40:21 +0800
c4bcde7ec sparc64: Use kernel page tables for vmemmap. ... Browse Code »

[ Upstream commit c06240c7f5c39c83dfd7849c0770775562441b96 ]

For sparse memory configurations, the vmemmap array behaves terribly
and it takes up an inordinate amount of space in the BSS section of
the kernel image unconditionally.

Just build huge PMDs and look them up just like we do for TLB misses
in the vmalloc area.

Kernel BSS shrinks by about 2MB.

Signed-off-by: David S. Miller
Acked-by: Bob Picco
Signed-off-by: Greg Kroah-Hartman

David S. Miller
2014-10-31 00:40:21 +0800
86f7cda18 sparc64: Fix physical memory management regressions with large max_phys_bits. ... Browse Code »

[ Upstream commit 0dd5b7b09e13dae32869371e08e1048349fd040c ]

If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like
DEBUG_PAGEALLOC stop working because the 3-level page tables only
can cover up to 43 bits.

Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to
47, several statically allocated tables became enormous.

Compounding this is that we will need to support up to 49 bits of
physical addressing for M7 chips.

The two tables in question are sparc64_valid_addr_bitmap and
kpte_linear_bitmap.

The first holds a bitmap, with 1 bit for each 4MB chunk of physical
memory, indicating whether that chunk actually exists in the machine
and is valid.

The second table is a set of 2-bit values which tell how large of a
mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB
chunk of ram in the system.

These tables are huge and take up an enormous amount of the BSS
section of the sparc64 kernel image. Specifically, the
sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K.

So let's solve the space wastage and the DEBUG_PAGEALLOC problem
at the same time, by using the kernel page tables (as designed) to
manage this information.

We have to keep using large mappings when DEBUG_PAGEALLOC is disabled,
and we do this by encoding huge PMDs and PUDs.

On a T4-2 with 256GB of ram the kernel page table takes up 16K with
DEBUG_PAGEALLOC disabled and 256MB with it enabled. Furthermore, this
memory is dynamically allocated at run time rather than coded
statically into the kernel image.

Signed-off-by: David S. Miller
Acked-by: Bob Picco
Signed-off-by: Greg Kroah-Hartman

David S. Miller
2014-10-31 00:40:21 +0800
ff5b56f81 sparc64: Adjust KTSB assembler to support larger physical addresses. ... Browse Code »

[ Upstream commit 8c82dc0e883821c098c8b0b130ffebabf9aab5df ]

As currently coded the KTSB accesses in the kernel only support up to
47 bits of physical addressing.

Adjust the instruction and patching sequence in order to support
arbitrary 64 bits addresses.

Signed-off-by: David S. Miller
Acked-by: Bob Picco
Signed-off-by: Greg Kroah-Hartman

David S. Miller
2014-10-31 00:40:21 +0800
e4d4fab37 sparc64: Define VA hole at run time, rather than at compile time. ... Browse Code »

[ Upstream commit 4397bed080598001e88f612deb8b080bb1cc2322 ]

Now that we use 4-level page tables, we can provide up to 53-bits of
virtual address space to the user.

Adjust the VA hole based upon the capabilities of the cpu type probed.

Signed-off-by: David S. Miller
Acked-by: Bob Picco
Signed-off-by: Greg Kroah-Hartman

David S. Miller
2014-10-31 00:40:20 +0800
6aac5338c sparc64: Switch to 4-level page tables. ... Browse Code »

[ Upstream commit ac55c768143aa34cc3789c4820cbb0809a76fd9c ]

This has become necessary with chips that support more than 43-bits
of physical addressing.

Based almost entirely upon a patch by Bob Picco.

Signed-off-by: David S. Miller
Acked-by: Bob Picco
Signed-off-by: Greg Kroah-Hartman

David S. Miller
2014-10-31 00:40:20 +0800
a06148024 sparc64: T5 PMU ... Browse Code »

The T5 (niagara5) has different PCR related HV fast trap values and a new
HV API Group. This patch utilizes these and shares when possible with niagara4.

We use the same sparc_pmu niagara4_pmu. Should there be new effort to
obtain the MCU perf statistics then this would have to be changed.

Cc: sparclinux@vger.kernel.org
Signed-off-by: Bob Picco
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

bob picco
2014-10-31 00:40:20 +0800
14f0211d3 sparc64: cpu hardware caps support for sparc M6 and M7 ... Browse Code »
182

Signed-off-by: Allen Pais
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Allen Pais
2014-10-31 00:40:20 +0800
6a610e722 sparc64: support M6 and M7 for building CPU distribution map ... Browse Code »

Add M6 and M7 chip type in cpumap.c to correctly build CPU distribution map that spans all online CPUs.

Signed-off-by: Allen Pais
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Allen Pais
2014-10-31 00:40:20 +0800
0e77996b8 sparc64: correctly recognise M6 and M7 cpu type ... Browse Code »

The following patch adds support for correctly
recognising M6 and M7 cpu type.

Signed-off-by: Allen Pais
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Allen Pais
2014-10-31 00:40:20 +0800
0929aa348 sparc64: Fix hibernation code refrence to PAGE_OFFSET. ... Browse Code »

We changed PAGE_OFFSET to be a variable rather than a constant,
but this reference here in the hibernate assembler got missed.

Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

David S. Miller
2014-10-31 00:40:20 +0800
edaad4aaa sparc64: Do not define thread fpregs save area as zero-length array. ... Browse Code »

[ Upstream commit e2653143d7d79a49f1a961aeae1d82612838b12c ]

This breaks the stack end corruption detection facility.

What that facility does it write a magic value to "end_of_stack()"
and checking to see if it gets overwritten.

"end_of_stack()" is "task_thread_info(p) + 1", which for sparc64 is
the beginning of the FPU register save area.

So once the user uses the FPU, the magic value is overwritten and the
debug checks trigger.

Fix this by making the size explicit.

Due to the size we use for the fpsaved[], gsr[], and xfsr[] arrays we
are limited to 7 levels of FPU state saves. So each FPU register set
is 256 bytes, allocate 256 * 7 for the fpregs area.

Reported-by: Meelis Roos
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

David S. Miller
2014-10-31 00:40:20 +0800
b22e08573 sparc64: Fix FPU register corruption with AES crypto offload. ... Browse Code »

[ Upstream commit f4da3628dc7c32a59d1fb7116bb042e6f436d611 ]

The AES loops in arch/sparc/crypto/aes_glue.c use a scheme where the
key material is preloaded into the FPU registers, and then we loop
over and over doing the crypt operation, reusing those pre-cooked key
registers.

There are intervening blkcipher*() calls between the crypt operation
calls. And those might perform memcpy() and thus also try to use the
FPU.

The sparc64 kernel FPU usage mechanism is designed to allow such
recursive uses, but with a catch.

There has to be a trap between the two FPU using threads of control.

The mechanism works by, when the FPU is already in use by the kernel,
allocating a slot for FPU saving at trap time. Then if, within the
trap handler, we try to use the FPU registers, the pre-trap FPU
register state is saved into the slot. Then at trap return time we
notice this and restore the pre-trap FPU state.

Over the long term there are various more involved ways we can make
this work, but for a quick fix let's take advantage of the fact that
the situation where this happens is very limited.

All sparc64 chips that support the crypto instructiosn also are using
the Niagara4 memcpy routine, and that routine only uses the FPU for
large copies where we can't get the source aligned properly to a
multiple of 8 bytes.

We look to see if the FPU is already in use in this context, and if so
we use the non-large copy path which only uses integer registers.

Furthermore, we also limit this special logic to when we are doing
kernel copy, rather than a user copy.

Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

David S. Miller
2014-10-31 00:40:20 +0800
67d9e5d4b sparc64: Fix lockdep warnings on reboot on Ultra-5 ... Browse Code »

[ Upstream commit bdcf81b658ebc4c2640c3c2c55c8b31c601b6996 ]

Inconsistently, the raw_* IRQ routines do not interact with and update
the irqflags tracing and lockdep state, whereas the raw_* spinlock
interfaces do.

This causes problems in p1275_cmd_direct() because we disable hardirqs
by hand using raw_local_irq_restore() and then do a raw_spin_lock()
which triggers a lockdep trace because the CPU's hw IRQ state doesn't
match IRQ tracing's internal software copy of that state.

The CPU's irqs are disabled, yet current->hardirqs_enabled is true.

====================
reboot: Restarting system
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3536 check_flags+0x7c/0x240()
DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
Modules linked in: openpromfs
CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: G W 3.17.0-dirty #145
Call Trace:
[000000000045919c] warn_slowpath_common+0x5c/0xa0
[0000000000459210] warn_slowpath_fmt+0x30/0x40
[000000000048f41c] check_flags+0x7c/0x240
[0000000000493280] lock_acquire+0x20/0x1c0
[0000000000832b70] _raw_spin_lock+0x30/0x60
[000000000068f2fc] p1275_cmd_direct+0x1c/0x60
[000000000068ed28] prom_reboot+0x28/0x40
[000000000043610c] machine_restart+0x4c/0x80
[000000000047d2d4] kernel_restart+0x54/0x80
[000000000047d618] SyS_reboot+0x138/0x200
[00000000004060b4] linux_sparc_syscall32+0x34/0x60
---[ end trace 5c439fe81c05a100 ]---
possible reason: unannotated irqs-off.
irq event stamp: 2010267
hardirqs last enabled at (2010267): [] vprintk_emit+0x4b8/0x580
hardirqs last disabled at (2010266): [] vprintk_emit+0x68/0x580
softirqs last enabled at (2010046): [] __do_softirq+0x378/0x4a0
softirqs last disabled at (2010039): [] do_softirq_own_stack+0x28/0x40
Resetting ...
====================

Use local_* variables of the hw IRQ interfaces so that IRQ tracing sees
all of our changes.

Reported-by: Meelis Roos
Tested-by: Meelis Roos
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

David S. Miller
2014-10-31 00:40:20 +0800
445fd8f9d sparc64: Fix reversed start/end in flush_tlb_kernel_range() ... Browse Code »

[ Upstream commit 473ad7f4fb005d1bb727e4ef27d370d28703a062 ]

When we have to split up a flush request into multiple pieces
(in order to avoid the firmware range) we don't specify the
arguments in the right order for the second piece.

Fix the order, or else we get hangs as the code tries to
flush "a lot" of entries and we get lockups like this:

[ 4422.981276] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [expect:117032]
[ 4422.996130] Modules linked in: ipv6 loop usb_storage igb ptp sg sr_mod ehci_pci ehci_hcd pps_core n2_rng rng_core
[ 4423.016617] CPU: 12 PID: 117032 Comm: expect Not tainted 3.17.0-rc4+ #1608
[ 4423.030331] task: fff8003cc730e220 ti: fff8003d99d54000 task.ti: fff8003d99d54000
[ 4423.045282] TSTATE: 0000000011001602 TPC: 00000000004521e8 TNPC: 00000000004521ec Y: 00000000 Not tainted
[ 4423.064905] TPC:
[ 4423.074964] g0: 000000000052fd10 g1: 00000001295a8000 g2: ffffff7176ffc000 g3: 0000000000002000
[ 4423.092324] g4: fff8003cc730e220 g5: fff8003dfedcc000 g6: fff8003d99d54000 g7: 0000000000000006
[ 4423.109687] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000000000003 o3: 00000000f0000000
[ 4423.127058] o4: 0000000000000080 o5: 00000001295a8000 sp: fff8003d99d56d01 ret_pc: 000000000052ff54
[ 4423.145121] RPC:
[ 4423.155185] l0: 0000000000000000 l1: 0000000000000000 l2: 0000000000a38040 l3: 0000000000000000
[ 4423.172559] l4: fff8003dae8965e0 l5: ffffffffffffffff l6: 0000000000000000 l7: 00000000f7e2b138
[ 4423.189913] i0: fff8003d99d576a0 i1: fff8003d99d576a8 i2: fff8003d99d575e8 i3: 0000000000000000
[ 4423.207284] i4: 0000000000008008 i5: fff8003d99d575c8 i6: fff8003d99d56df1 i7: 0000000000530c24
[ 4423.224640] I7:
[ 4423.234193] Call Trace:
[ 4423.239051] [0000000000530c24] free_vmap_area_noflush+0x64/0x80
[ 4423.251029] [0000000000531a7c] remove_vm_area+0x5c/0x80
[ 4423.261628] [0000000000531b80] __vunmap+0x20/0x120
[ 4423.271352] [000000000071cf18] n_tty_close+0x18/0x40
[ 4423.281423] [00000000007222b0] tty_ldisc_close+0x30/0x60
[ 4423.292183] [00000000007225a4] tty_ldisc_reinit+0x24/0xa0
[ 4423.303120] [0000000000722ab4] tty_ldisc_hangup+0xd4/0x1e0
[ 4423.314232] [0000000000719aa0] __tty_hangup+0x280/0x3c0
[ 4423.324835] [0000000000724cb4] pty_close+0x134/0x1a0
[ 4423.334905] [000000000071aa24] tty_release+0x104/0x500
[ 4423.345316] [00000000005511d0] __fput+0x90/0x1e0
[ 4423.354701] [000000000047fa54] task_work_run+0x94/0xe0
[ 4423.365126] [0000000000404b44] __handle_signal+0xc/0x2c

Fixes: 4ca9a23765da ("sparc64: Guard against flushing openfirmware mappings.")
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

David S. Miller
2014-10-31 00:40:19 +0800
9cb7f1e41 sparc: bpf_jit: fix loads from negative offsets ... Browse Code »

[ Upstream commit 35607b02dbef304fa5037236a3b43c1d8ab2aa52 ]

- fix BPF_LD|ABS|IND from negative offsets:
make sure to sign extend lower 32 bits in 64-bit register
before calling C helpers from JITed code, otherwise 'int k'
argument of bpf_internal_load_pointer_neg_helper() function
will be added as large unsigned integer, causing packet size
check to trigger and abort the program.

It's worth noting that JITed code for 'A = A op K' will affect
upper 32 bits differently depending whether K is simm13 or not.
Since small constants are sign extended, whereas large constants
are stored in temp register and zero extended.
That is ok and we don't have to pay a penalty of sign extension
for every sethi, since all classic BPF instructions have 32-bit
semantics and we only need to set correct upper bits when
transitioning from JITed code into C.

- though instructions 'A &= 0' and 'A *= 0' are odd, JIT compiler
should not optimize them out

Signed-off-by: Alexei Starovoitov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Alexei Starovoitov
2014-10-31 00:40:19 +0800
6e2d91c63 sparc: bpf_jit: fix support for ldx/stx mem and SKF_AD_VLAN_TAG ... Browse Code »

[ Upstream commit f6f2332dce0efeea8c5653b6e9d1e8c379ace65c ]

fix several issues in sparc BPF JIT compiler.

ldx/stx related:
. classic BPF instructions that access mem[] slots were not setting
SEEN_MEM flag, so stack wasn't allocated. Fix that by advertising
correct flags

. LDX/STX instructions were missing SEEN_XREG, so register value
could have leaked to user space. Fix it.

. since stack for mem[] slots is allocated with 'sub %sp' instead
of 'save %sp', use %sp as base register instead of %fp.

. ldx mem[0] means first slot in classic BPF which should have
-4 offset instead of 0.

. sparc64 needs 2047 stack bias as per ABI to access stack

. emit_stmem() was using LD32I macro instead of ST32I

SKF_AD_VLAN_TAG* related:
. SKF_AD_VLAN_TAG_PRESENT must return 1 or 0 instead of '> 0' or 0
as per classic BPF de facto standard

. SKF_AD_VLAN_TAG needs to mask the field correctly

Fixes: 2809a2087cc4 ("net: filter: Just In Time compiler for sparc")
Signed-off-by: Alexei Starovoitov
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Alexei Starovoitov
2014-10-31 00:40:19 +0800
a068a292f sparc: Let memset return the address argument ... Browse Code »

[ Upstream commit 74cad25c076a2f5253312c2fe82d1a4daecc1323 ]

This makes memset follow the standard (instead of returning 0 on success). This
is needed when certain versions of gcc optimizes around memset calls and assume
that the address argument is preserved in %o0.

Signed-off-by: Andreas Larsson
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Andreas Larsson
2014-10-31 00:40:19 +0800
200fe7a70 sparc64: Move request_irq() from ldc_bind() to ldc_alloc() ... Browse Code »

[ Upstream commit c21c4ab0d6921f7160a43216fa6973b5924de561 ]

The request_irq() needs to be done from ldc_alloc()
to avoid the following (caught by lockdep)

[00000000004a0738] __might_sleep+0xf8/0x120
[000000000058bea4] kmem_cache_alloc_trace+0x184/0x2c0
[00000000004faf80] request_threaded_irq+0x80/0x160
[000000000044f71c] ldc_bind+0x7c/0x220
[0000000000452454] vio_port_up+0x54/0xe0
[00000000101f6778] probe_disk+0x38/0x220 [sunvdc]
[00000000101f6b8c] vdc_port_probe+0x22c/0x300 [sunvdc]
[0000000000451a88] vio_device_probe+0x48/0x60
[000000000074c56c] really_probe+0x6c/0x300
[000000000074c83c] driver_probe_device+0x3c/0xa0
[000000000074c92c] __driver_attach+0x8c/0xa0
[000000000074a6ec] bus_for_each_dev+0x6c/0xa0
[000000000074c1dc] driver_attach+0x1c/0x40
[000000000074b0fc] bus_add_driver+0xbc/0x280

Signed-off-by: Sowmini Varadhan
Acked-by: Dwight Engen
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Sowmini Varadhan
2014-10-31 00:40:19 +0800
cbc578cfb sparc64: find_node adjustment ... Browse Code »

[ Upstream commit 3dee9df54836d5f844f3d58281d3f3e6331b467f ]

We have seen an issue with guest boot into LDOM that causes early boot failures
because of no matching rules for node identitity of the memory. I analyzed this
on my T4 and concluded there might not be a solution. I saw the issue in
mainline too when booting into the control/primary domain - with guests
configured. Note, this could be a firmware bug on some older machines.

I'll provide a full explanation of the issues below. Should we not find a
matching BEST latency group for a real address (RA) then we will assume node 0.
On the T4-2 here with the information provided I can't see an alternative.

Technically the LDOM shown below should match the MBLOCK to the
favorable latency group. However other factors must be considered too. Were
the memory controllers configured "fine" grained interleave or "coarse"
grain interleaved - T4. Also should a "group" MD node be considered a NUMA
node?

There has to be at least one Machine Description (MD) "group" and hence one
NUMA node. The group can have one or more latency groups (lg) - more than one
memory controller. The current code chooses the smallest latency as the most
favorable per group. The latency and lg information is in MLGROUP below.
MBLOCK is the base and size of the RAs for the machine as fetched from OBP
/memory "available" property. My machine has one MBLOCK but more would be
possible - with holes?

For a T4-2 the following information has been gathered:
with LDOM guest
MEMBLOCK configuration:
memory size = 0x27f870000
memory.cnt = 0x3
memory[0x0] [0x00000020400000-0x0000029fc67fff], 0x27f868000 bytes
memory[0x1] [0x0000029fd8a000-0x0000029fd8bfff], 0x2000 bytes
memory[0x2] [0x0000029fd92000-0x0000029fd97fff], 0x6000 bytes
reserved.cnt = 0x2
reserved[0x0] [0x00000020800000-0x000000216c15c0], 0xec15c1 bytes
reserved[0x1] [0x00000024800000-0x0000002c180c1e], 0x7980c1f bytes
MBLOCK[0]: base[20000000] size[280000000] offset[0]
(note: "base" and "size" reported in "MBLOCK" encompass the "memory[X]" values)
(note: (RA + offset) & mask = val is the formula to detect a match for the
memory controller. should there be no match for find_node node, a return
value of -1 resulted for the node - BAD)

There is one group. It has these forward links
MLGROUP[1]: node[545] latency[1f7e8] match[200000000] mask[200000000]
MLGROUP[2]: node[54d] latency[2de60] match[0] mask[200000000]
NUMA NODE[0]: node[545] mask[200000000] val[200000000] (latency[1f7e8])
(note: "val" is the best lg's (smallest latency) "match")

no LDOM guest - bare metal
MEMBLOCK configuration:
memory size = 0xfdf2d0000
memory.cnt = 0x3
memory[0x0] [0x00000020400000-0x00000fff6adfff], 0xfdf2ae000 bytes
memory[0x1] [0x00000fff6d2000-0x00000fff6e7fff], 0x16000 bytes
memory[0x2] [0x00000fff766000-0x00000fff771fff], 0xc000 bytes
reserved.cnt = 0x2
reserved[0x0] [0x00000020800000-0x00000021a04580], 0x1204581 bytes
reserved[0x1] [0x00000024800000-0x0000002c7d29fc], 0x7fd29fd bytes
MBLOCK[0]: base[20000000] size[fe0000000] offset[0]

there are two groups
group node[16d5]
MLGROUP[0]: node[1765] latency[1f7e8] match[0] mask[200000000]
MLGROUP[3]: node[177d] latency[2de60] match[200000000] mask[200000000]
NUMA NODE[0]: node[1765] mask[200000000] val[0] (latency[1f7e8])
group node[171d]
MLGROUP[2]: node[1775] latency[2de60] match[0] mask[200000000]
MLGROUP[1]: node[176d] latency[1f7e8] match[200000000] mask[200000000]
NUMA NODE[1]: node[176d] mask[200000000] val[200000000] (latency[1f7e8])
(note: for this two "group" bare metal machine, 1/2 memory is in group one's
lg and 1/2 memory is in group two's lg).

Cc: sparclinux@vger.kernel.org
Signed-off-by: Bob Picco
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

bob picco
2014-10-31 00:40:19 +0800
a5fb60021 sparc64: Fix corrupted thread fault code. ... Browse Code »

[ Upstream commit 84bd6d8b9c0f06b3f188efb479c77e20f05e9a8a ]

Every path that ends up at do_sparc64_fault() must install a valid
FAULT_CODE_* bitmask in the per-thread fault code byte.

Two paths leading to the label winfix_trampoline (which expects the
FAULT_CODE_* mask in register %g4) were not doing so:

1) For pre-hypervisor TLB protection violation traps, if we took
the 'winfix_trampoline' path we wouldn't have %g4 initialized
with the FAULT_CODE_* value yet. Resulting in using the
TLB_TAG_ACCESS register address value instead.

2) In the TSB miss path, when we notice that we are going to use a
hugepage mapping, but we haven't allocated the hugepage TSB yet, we
still have to take the window fixup case into consideration and
in that particular path we leave %g4 not setup properly.

Errors on this sort were largely invisible previously, but after
commit 4ccb9272892c33ef1c19a783cfa87103b30c2784 ("sparc64: sun4v TLB
error power off events") we now have a fault_code mask bit
(FAULT_CODE_BAD_RA) that triggers due to this bug.

FAULT_CODE_BAD_RA triggers because this bit is set in TLB_TAG_ACCESS
(see #1 above) and thus we get seemingly random bus errors triggered
for user processes.

Fixes: 4ccb9272892c ("sparc64: sun4v TLB error power off events")
Reported-by: Meelis Roos
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

David S. Miller
2014-10-31 00:40:19 +0800
ac1addf5a sparc64: sun4v TLB error power off events ... Browse Code »

[ Upstream commit 4ccb9272892c33ef1c19a783cfa87103b30c2784 ]

We've witnessed a few TLB events causing the machine to power off because
of prom_halt. In one case it was some nfs related area during rmmod. Another
was an mmapper of /dev/mem. A more recent one is an ITLB issue with
a bad pagesize which could be a hardware bug. Bugs happen but we should
attempt to not power off the machine and/or hang it when possible.

This is a DTLB error from an mmapper of /dev/mem:
[root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1
SUN4V-DTLB: TPC
SUN4V-DTLB: O7[fffff801081979d0]
SUN4V-DTLB: O7
SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2]
.

This is recent mainline for ITLB:
[ 3708.179864] SUN4V-ITLB: TPC
[ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8]
[ 3708.197377] SUN4V-ITLB: O7
[ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4]
.

Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call
prom_halt() and drop us to OF command prompt "ok". This isn't the case for
LDOMs and the machine powers off.

For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause
a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask
of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal
one("1"). Otherwise, for %tl > 1, we proceed eventually to die_if_kernel().

The logic of this patch was partially inspired by David Miller's feedback.

Power off of large sparc64 machines is painful. Plus die_if_kernel provides
more context. A reset sequence isn't a brief period on large sparc64 but
better than power-off/power-on sequence.

Cc: sparclinux@vger.kernel.org
Signed-off-by: Bob Picco
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

bob picco
2014-10-31 00:40:19 +0800
7907ea428 sparc32: dma_alloc_coherent must honour gfp flags ... Browse Code »

[ Upstream commit d1105287aabe88dbb3af825140badaa05cf0442c ]

dma_zalloc_coherent() calls dma_alloc_coherent(__GFP_ZERO)
but the sparc32 implementations sbus_alloc_coherent() and
pci32_alloc_coherent() doesn't take the gfp flags into
account.

Tested on the SPARC32/LEON GRETH Ethernet driver which fails
due to dma_alloc_coherent(__GFP_ZERO) returns non zeroed
pages.

Signed-off-by: Daniel Hellstrom
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

Daniel Hellstrom
2014-10-31 00:40:19 +0800
e7f7dcadf sparc64: Fix pcr_ops initialization and usage bugs. ... Browse Code »

[ Upstream commit 8bccf5b313180faefce38e0d1140f76e0f327d28 ]

Christopher reports that perf_event_print_debug() can crash in uniprocessor
builds. The crash is due to pcr_ops being NULL.

This happens because pcr_arch_init() is only invoked by smp_cpus_done() which
only executes in SMP builds.

init_hw_perf_events() is closely intertwined with pcr_ops being setup properly,
therefore:

1) Call pcr_arch_init() early on from init_hw_perf_events(), instead of
from smp_cpus_done().

2) Do not hook up a PMU type if pcr_ops is NULL after pcr_arch_init().

3) Move init_hw_perf_events to a later initcall so that it we will be
sure to invoke pcr_arch_init() after all cpus are brought up.

Finally, guard the one naked sequence of pcr_ops dereferences in
__global_pmu_self() with an appropriate NULL check.

Reported-by: Christopher Alexander Tobias Schulze
Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

David S. Miller
2014-10-31 00:40:19 +0800
4eed408a0 sparc64: Do not disable interrupts in nmi_cpu_busy() ... Browse Code »

[ Upstream commit 58556104e9cd0107a7a8d2692cf04ef31669f6e4 ]

nmi_cpu_busy() is a SMP function call that just makes sure that all of the
cpus are spinning using cpu cycles while the NMI test runs.

It does not need to disable IRQs because we just care about NMIs executing
which will even with 'normal' IRQs disabled.

It is not legal to enable hard IRQs in a SMP cross call, in fact this bug
triggers the BUG check in irq_work_run_list():

BUG_ON(!irqs_disabled());

Because now irq_work_run() is invoked from the tail of
generic_smp_call_function_single_interrupt().

Signed-off-by: David S. Miller
Signed-off-by: Greg Kroah-Hartman

David S. Miller
2014-10-31 00:40:18 +0800
e81cffc4e xfs: ensure WB_SYNC_ALL writeback handles partial pages correctly ... Browse Code »

commit 0d085a529b427d97710e6a41f8a4f23e1757cd12 upstream.

XFS has been having trouble with stray delayed allocation extents
beyond EOF for a long time. Recent changes to the collapse range
code has triggered erroneous EBUSY errors on page invalidtion for
block size smaller than page size filesystems. These
have been caused by dirty buffers beyond EOF on a partial page which
do not get written to disk during a sync.

The issue is that write-ahead in xfs_cluster_write() finds such a
partial page and handles it by leaving the page dirty but pushing it
into a writeback state. This used to work just fine, as the
write_cache_pages() code would then find the dirty partial page in
the next mapping tree lookup as the dirty tag is still set.

Unfortunately, when we moved to a mark and sweep approach to
writeback to fix other writeback sync issues, we broken this. THe
act of marking the page as under writeback now clears the TOWRITE
tag in the radix tree, even though the page is still dirty. This
causes the TOWRITE tag to be cleared, and hence the next lookup on
the mapping tree does not find the dirty partial page and so doesn't
try to write it again.

This same writeback bug was found recently in ext4 and fixed in
commit 1c8349a ("ext4: fix data integrity sync in ordered mode")
without communication to the wider filesystem community. We can use
exactly the same fix here so the TOWRITE flag is not cleared on
partial page writes.

cc: stable@vger.kernel.org # dependent on 1c8349a17137b93f0a83f276c764a6df1b9a116e
Root-cause-found-by: Brian Foster
Signed-off-by: Dave Chinner
Reviewed-by: Brian Foster
Signed-off-by: Dave Chinner
Signed-off-by: Greg Kroah-Hartman

Dave Chinner
2014-10-31 00:40:18 +0800
0419937b5 ecryptfs: avoid to access NULL pointer when write metadata in xattr ... Browse Code »
2

commit 35425ea2492175fd39f6116481fe98b2b3ddd4ca upstream.

Christopher Head 2014-06-28 05:26:20 UTC described:
"I tried to reproduce this on 3.12.21. Instead, when I do "echo hello > foo"
in an ecryptfs mount with ecryptfs_xattr specified, I get a kernel crash:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [] fsstack_copy_attr_all+0x2/0x61
PGD d7840067 PUD b2c3c067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: nvidia(PO)
CPU: 3 PID: 3566 Comm: bash Tainted: P O 3.12.21-gentoo-r1 #2
Hardware name: ASUSTek Computer Inc. G60JX/G60JX, BIOS 206 03/15/2010
task: ffff8801948944c0 ti: ffff8800bad70000 task.ti: ffff8800bad70000
RIP: 0010:[] [] fsstack_copy_attr_all+0x2/0x61
RSP: 0018:ffff8800bad71c10 EFLAGS: 00010246
RAX: 00000000000181a4 RBX: ffff880198648480 RCX: 0000000000000000
RDX: 0000000000000004 RSI: ffff880172010450 RDI: 0000000000000000
RBP: ffff880198490e40 R08: 0000000000000000 R09: 0000000000000000
R10: ffff880172010450 R11: ffffea0002c51e80 R12: 0000000000002000
R13: 000000000000001a R14: 0000000000000000 R15: ffff880198490e40
FS: 00007ff224caa700(0000) GS:ffff88019fcc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000bb07f000 CR4: 00000000000007e0
Stack:
ffffffff811826e8 ffff8800a39d8000 0000000000000000 000000000000001a
ffff8800a01d0000 ffff8800a39d8000 ffffffff81185fd5 ffffffff81082c2c
00000001a39d8000 53d0abbc98490e40 0000000000000037 ffff8800a39d8220
Call Trace:
[] ? ecryptfs_setxattr+0x40/0x52
[] ? ecryptfs_write_metadata+0x1b3/0x223
[] ? should_resched+0x5/0x23
[] ? ecryptfs_initialize_file+0xaf/0xd4
[] ? ecryptfs_create+0xf4/0x142
[] ? vfs_create+0x48/0x71
[] ? do_last.isra.68+0x559/0x952
[] ? link_path_walk+0xbd/0x458
[] ? path_openat+0x224/0x472
[] ? do_filp_open+0x2b/0x6f
[] ? __alloc_fd+0xd6/0xe7
[] ? do_sys_open+0x65/0xe9
[] ? system_call_fastpath+0x16/0x1b
RIP [] fsstack_copy_attr_all+0x2/0x61
RSP
CR2: 0000000000000000
---[ end trace df9dba5f1ddb8565 ]---"

If we create a file when we mount with ecryptfs_xattr_metadata option, we will
encounter a crash in this path:
->ecryptfs_create
->ecryptfs_initialize_file
->ecryptfs_write_metadata
->ecryptfs_write_metadata_to_xattr
->ecryptfs_setxattr
->fsstack_copy_attr_all
It's because our dentry->d_inode used in fsstack_copy_attr_all is NULL, and it
will be initialized when ecryptfs_initialize_file finish.

So we should skip copying attr from lower inode when the value of ->d_inode is
invalid.

Signed-off-by: Chao Yu
Signed-off-by: Tyler Hicks
Signed-off-by: Greg Kroah-Hartman

Chao Yu
2014-10-31 00:40:18 +0800
d18668009 ARM: dts: imx28-evk: Let i2c0 run at 100kHz ... Browse Code »

commit d1e61eb443dc7512885dfe89ee2f2a1c29fcb1da upstream.

Commit 78b81f4666fb ("ARM: dts: imx28-evk: Run I2C0 at 400kHz") caused issues
when doing the following sequence in loop:

- Boot the kernel
- Perform audio playback
- Reboot the system via 'reboot' command

In many times the audio card cannot be probed, which causes playback to fail.

After restoring to the original i2c0 frequency of 100kHz there is no such
problem anymore.

This reverts commit 78b81f4666fbb22a20b1e63e5baf197ad2e90e88.

Signed-off-by: Fabio Estevam
Signed-off-by: Shawn Guo
Signed-off-by: Greg Kroah-Hartman

Fabio Estevam
2014-10-31 00:40:18 +0800
8fd173657 ARM: mvebu: Netgear RN102: Use Hardware BCH ECC ... Browse Code »

commit ace8578182dc347b043c0825b9873f62fdaa5b77 upstream.

The bootloader on the Netgear ReadyNAS RN102 uses Hardware BCH ECC
(strength = 4), while the pxa3xx NAND driver by default uses
Hamming ECC (strength = 1).

This patch changes the ECC mode on these machines to match that
of the bootloader and of the stock firmware. That way, it is
now possible to update the kernel from userland (e.g. using
standard tools from mtd-utils package); u-boot will happily
load and boot it.

Fixes: 92beaccd8b49 ("ARM: mvebu: Enable NAND controller in ReadyNAS 102 .dts file")
Signed-off-by: Ben Peddell
Acked-by: Ezequiel Garcia
Tested-by: Arnaud Ebalard
Link: https://lkml.kernel.org/r/1410339341-3372-1-git-send-email-klightspeed@killerwolves.net
Signed-off-by: Jason Cooper
Signed-off-by: Greg Kroah-Hartman

klightspeed@killerwolves.net
2014-10-31 00:40:18 +0800
fac803d6b ARM: mvebu: Netgear RN2120: Use Hardware BCH ECC ... Browse Code »

commit 500abb6ccb9e3f8d638a7f422443a8549245ef90 upstream.

The bootloader on the Netgear ReadyNAS RN2120 uses Hardware BCH
ECC (strength = 4), while the pxa3xx NAND driver by default uses
Hamming ECC (strength = 1).

This patch changes the ECC mode on these machines to match that
of the bootloader and of the stock firmware. That way, it is
now possible to update the kernel from userland (e.g. using
standard tools from mtd-utils package); u-boot will happily
load and boot it.

The issue was initially reported and fixed by Ben Pedell for
RN102. The RN2120 shares the same Hynix H27U1G8F2BTR NAND
flash and setup. This patch is based on Ben's fix for RN102.

Fixes: ad51eddd95ad ("ARM: mvebu: Enable NAND controller in ReadyNAS 2120 .dts file")
Signed-off-by: Arnaud Ebalard
Link: https://lkml.kernel.org/r/61f6a1b7ad0adc57a0e201b9680bc2e5f214a317.1410035142.git.arno@natisbad.org
Signed-off-by: Jason Cooper
Signed-off-by: Greg Kroah-Hartman

Arnaud Ebalard
2014-10-31 00:40:18 +0800
98080726a ARM: mvebu: Netgear RN104: Use Hardware BCH ECC ... Browse Code »

commit 225b94cdf719d0bc522a354bdafc18e5da5ff83b upstream.

The bootloader on the Netgear ReadyNAS RN104 uses Hardware BCH
ECC (strength = 4), while the pxa3xx NAND driver by default uses
Hamming ECC (strength = 1).

This patch changes the ECC mode on these machines to match that
of the bootloader and of the stock firmware. That way, it is
now possible to update the kernel from userland (e.g. using
standard tools from mtd-utils package); u-boot will happily
load and boot it.

The issue was initially reported and fixed by Ben Pedell for
RN102. The RN104 shares the same Hynix H27U1G8F2BTR NAND
flash and setup. This patch is based on Ben's fix for RN102.

Fixes: 0373a558bd79 ("ARM: mvebu: Enable NAND controller in ReadyNAS 104 .dts file")
Signed-off-by: Arnaud Ebalard
Link: https://lkml.kernel.org/r/920c7e7169dc6aaaa3eb4bced2336d38e77b8864.1410035142.git.arno@natisbad.org
Signed-off-by: Jason Cooper
Signed-off-by: Greg Kroah-Hartman

Arnaud Ebalard
2014-10-31 00:40:18 +0800
7f688ac44 ARM: Kirkwood: Fix DT based DSA. ... Browse Code »

commit 4f5e01e96d424b54f5f0e89ee1ba9ccca03a3941 upstream.

During the conversion of boards to use DT to instantiate Distributed
Switch Architecture, nobody volunteered to test. As to be expected,
the conversion was flawed. Testers and access to hardware has now
become available, and this patch hopefully fixes the problems.

dsa,mii-bus must be a phandle to the top level mdio node, not the port
specific subnode of the mdio device.

dsa,ethernet must be a phandle to the port subnode within the ethernet
DT node, not the ethernet node.

Don't pinctrl hog the card detect gpio for mvsdio.

Rename the .dts files to make it clearer which file is for the Z0
stepping and which for the A0 or later stepping.

Signed-off-by: Andrew Lunn
Cc: seugene@marvell.com
Tested-by: Eugene Sanivsky
Fixes: e2eaa339af44: ("ARM: Kirkwood: convert rd88f6281-setup.c to DT.")
Fixes: e7c8f3808be8: ("ARM: kirkwood: Convert mv88f6281gtw_ge switch setup to DT")
Link: https://lkml.kernel.org/r/1409592941-22244-1-git-send-email-andrew@lunn.ch
Signed-off-by: Jason Cooper
Signed-off-by: Greg Kroah-Hartman

Andrew Lunn
2014-10-31 00:40:18 +0800
0aeee1b45 ARM: at91/PMC: don't forget to write PMC_PCDR register to disable clocks ... Browse Code »

commit cfa1950e6c6b72251e80adc736af3c3d2907ab0e upstream.

When introducing support for sama5d3, the write to PMC_PCDR register has
been accidentally removed.

Reported-by: Nathalie Cyrille
Signed-off-by: Ludovic Desroches
Signed-off-by: Nicolas Ferre
Signed-off-by: Greg Kroah-Hartman

Ludovic Desroches
2014-10-31 00:40:18 +0800