Eric Lee / smarc-fsl-linux-kernel

15 Apr, 2015

1 commit

57ca654be ARM: ensure delay timer has sufficient accuracy for delays ... Browse Code »

We have recently had an example of someone wanting to use a 90kHz timer
for the software delay loop.

udelay() needs to have at least microsecond resolution to allow drivers
access to a delay mechanism with a reasonable chance of delaying the
period they requested within at least a 50% marging of error, especially
for small delays.

Discussion about the udelay() accuracy can be found at:
https://lkml.org/lkml/2011/1/9/37

Reject timers which are unable to supply this level of resolution.

Acked-by: Nicolas Pitre
Signed-off-by: Russell King

Russell King
2015-04-15 05:28:07 +0800

30 Mar, 2015

1 commit

c4a84ae39 ARM: 8322/1: keep .text and .fixup regions closer together ... Browse Code »

This moves all fixup snippets to the .text.fixup section, which is
a special section that gets emitted along with the .text section
for each input object file, i.e., the snippets are kept much closer
to the code they refer to, which helps prevent linker failure on
large kernels.

Acked-by: Nicolas Pitre
Signed-off-by: Ard Biesheuvel
Signed-off-by: Russell King

Ard Biesheuvel
2015-03-30 06:11:56 +0800

16 Jan, 2015

1 commit

c25630381 ARM: 8285/1: remove ARMv3 user access code again ... Browse Code »

This code was restored with commit 080fc66fb5 ("ARM: Bring back ARMv3 IO
and user access code") because the RiscPC memory bus does not understand
half-word load/stores. However only the IO code needed restoring since
the alternative user access code contains no half-word accesses, is
already used when CONFIG_PREEMPT is set and runs faster on a StrongARM.

Signed-off-by: Nicolas Pitre
Signed-off-by: Russell King

Nicolas Pitre
2015-01-16 22:49:08 +0800

28 Nov, 2014

3 commits

279f487e0 ARM: 8225/1: Add unwinding support for memory copy functions ... Browse Code »

The memory copy functions(memcpy, __copy_from_user, __copy_to_user)
never had unwinding annotations added. Currently, when accessing
invalid pointer by these functions occurs the backtrace shown will
stop at these functions or some completely unrelated function.
Add unwinding annotations in hopes of getting a more useful backtrace
in following cases:
1. die on accessing invalid pointer by these functions
2. kprobe trapped at any instruction within these functions
3. interrupted at any instruction within these functions

Signed-off-by: Lin Yongting
Signed-off-by: Russell King

Lin Yongting
2014-11-28 00:00:25 +0800
207a6cb06 ARM: 8224/1: Add unwinding support for memmove function ... Browse Code »

The memmove function never had unwinding annotations added.
Currently, when accessing invalid pointer by memmove occurs the
backtrace shown will stop at memmove or some completely unrelated
function. Add unwinding annotations in hopes of getting a more
useful backtrace in following cases:
1. die on accessing invalid pointer by memmove
2. kprobe trapped at any instruction within memmove
3. interrupted at any instruction within memmove

Signed-off-by: Lin Yongting
Signed-off-by: Russell King

Lin Yongting
2014-11-28 00:00:24 +0800
20cb6abfe ARM: 8223/1: Add unwinding support for __memzero function ... Browse Code »

The __memzero function never had unwinding annotations added.
Currently, when accessing invalid pointer by __memzero occurs the
backtrace shown will stop at __memzero or some completely unrelated
function. Add unwinding annotations in hopes of getting a more
useful backtrace in following cases:
1. die on accessing invalid pointer by __memzero
2. kprobe trapped at any instruction within __memzero
3. interrupted at any instruction within __memzero

Signed-off-by: Lin Yongting
Signed-off-by: Russell King

Lin Yongting
2014-11-28 00:00:23 +0800

21 Nov, 2014

1 commit

c2459d35f ARM: 8204/1: Add unwinding support for memset function ... Browse Code »

The memset function never had unwinding annotations added.
Currently, when accessing NULL pointer by memset occurs the
backtrace shown will stop at memset or some completely unrelated
function. Add unwinding annotations in hopes of getting a more
useful backtrace when accessing NULL pointer by memset, kprobe
or interrupt.

Signed-off-by: Lin Yongting
Signed-off-by: Russell King

Lin Yongting
2014-11-21 23:24:49 +0800

13 Sep, 2014

1 commit

d9981380b ARM: 8137/1: fix get_user BE behavior for target variable with size of 8 bytes ... Browse Code »

e38361d 'ARM: 8091/2: add get_user() support for 8 byte types' commit
broke V7 BE get_user call when target var size is 64 bit, but '*ptr' size
is 32 bit or smaller. e38361d changed type of __r2 from 'register
unsigned long' to 'register typeof(x) __r2 asm("r2")' i.e before the change
even when target variable size was 64 bit, __r2 was still 32 bit.
But after e38361d commit, for target var of 64 bit size, __r2 became 64
bit and now it should occupy 2 registers r2, and r3. The issue in BE case
that r3 register is least significant word of __r2 and r2 register is most
significant word of __r2. But __get_user_4 still copies result into r2 (most
significant word of __r2). Subsequent code copies from __r2 into x, but
for situation described it will pick up only garbage from r3 register.

Special __get_user_64t_(124) functions are introduced. They are similar to
corresponding __get_user_(124) function but result stored in r3 register
(lsw in case of 64 bit __r2 in BE image). Those function are used by
get_user macro in case of BE and target var size is 64bit.

Also changed __get_user_lo8 name into __get_user_32t_8 to get consistent
naming accross all cases.

Signed-off-by: Victor Kamensky
Suggested-by: Daniel Thompson
Reviewed-by: Daniel Thompson
Signed-off-by: Russell King

Victor Kamensky
2014-09-13 00:38:59 +0800

09 Aug, 2014

1 commit

b3345d7c5 Merge tag 'soc-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc ... Browse Code »

Pull ARM SoC platform changes from Olof Johansson:
"This is the bulk of new SoC enablement and other platform changes for
3.17:

- Samsung S5PV210 has been converted to DT and multiplatform
- Clock drivers and bindings for some of the lower-end i.MX 1/2
platforms
- Kirkwood, one of the popular Marvell platforms, is folded into the
mvebu platform code, removing mach-kirkwood
- Hwmod data for TI AM43xx and DRA7 platforms
- More additions of Renesas shmobile platform support
- Removal of plat-samsung contents that can be removed with S5PV210
being multiplatform/DT-enabled and the other two old platforms
being removed

New platforms (most with only basic support right now):

- Hisilicon X5HD2 settop box chipset is introduced
- Mediatek MT6589 (mobile chipset) is introduced
- Broadcom BCM7xxx settop box chipset is introduced

+ as usual a lot other pieces all over the platform code"

* tag 'soc-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (240 commits)
ARM: hisi: remove smp from machine descriptor
power: reset: move hisilicon reboot code
ARM: dts: Add hix5hd2-dkb dts file.
ARM: debug: Rename Hi3716 to HIX5HD2
ARM: hisi: enable hix5hd2 SoC
ARM: hisi: add ARCH_HISI
MAINTAINERS: add entry for Broadcom ARM STB architecture
ARM: brcmstb: select GISB arbiter and interrupt drivers
ARM: brcmstb: add infrastructure for ARM-based Broadcom STB SoCs
ARM: configs: enable SMP in bcm_defconfig
ARM: add SMP support for Broadcom mobile SoCs
Documentation: arm: misc updates to Marvell EBU SoC status
Documentation: arm: add URLs to public datasheets for the Marvell Armada XP SoC
ARM: mvebu: fix build without platforms selected
ARM: mvebu: add cpuidle support for Armada 38x
ARM: mvebu: add cpuidle support for Armada 370
cpuidle: mvebu: add Armada 38x support
cpuidle: mvebu: add Armada 370 support
cpuidle: mvebu: rename the driver from armada-370-xp to mvebu-v7
ARM: mvebu: export the SCU address
...

Linus Torvalds
2014-08-09 02:14:29 +0800

18 Jul, 2014

2 commits

e38361d03 ARM: 8091/2: add get_user() support for 8 byte types ... Browse Code »

Recent contributions, including to DRM and binder, introduce 64-bit
values in their interfaces. A common motivation for this is to allow
the same ABI for 32- and 64-bit userspaces (and therefore also a shared
ABI for 32/64 hybrid userspaces). Anyhow, the developers would like to
avoid gotchas like having to use copy_from_user().

This feature is already implemented on x86-32 and the majority of other
32-bit architectures. The current list of get_user_8 hold out
architectures are: arm, avr32, blackfin, m32r, metag, microblaze,
mn10300, sh.

Credit:

My name sits rather uneasily at the top of this patch. The v1 and
v2 versions of the patch were written by Rob Clark and to produce v4
I mostly copied code from Russell King and H. Peter Anvin. However I
have mangled the patch sufficiently that *blame* is rightfully mine
even if credit should more widely shared.

Changelog:

v5: updated to use the ret macro (requested by Russell King)
v4: remove an inlined add on big endian systems (spotted by Russell King),
used __ARMEB__ rather than BIG_ENDIAN (to match rest of file),
cleared r3 on EFAULT during __get_user_8.
v3: fix a couple of checkpatch issues
v2: pass correct size to check_uaccess, and better handling of narrowing
double word read with __get_user_xb() (Russell King's suggestion)
v1: original

Signed-off-by: Rob Clark
Signed-off-by: Daniel Thompson
Signed-off-by: Russell King

Daniel Thompson
2014-07-18 19:29:34 +0800
6ebbf2ce4 ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ ... Browse Code »

ARMv6 and greater introduced a new instruction ("bx") which can be used
to return from function calls. Recent CPUs perform better when the
"bx lr" instruction is used rather than the "mov pc, lr" instruction,
and this sequence is strongly recommended to be used by the ARM
architecture manual (section A.4.1.1).

We provide a new macro "ret" with all its variants for the condition
code which will resolve to the appropriate instruction.

Rather than doing this piecemeal, and miss some instances, change all
the "mov pc" instances to use the new macro, with the exception of
the "movs" instruction and the kprobes code. This allows us to detect
the "mov pc, lr" case and fix it up - and also gives us the possibility
of deploying this for other registers depending on the CPU selection.

Reported-by: Will Deacon
Tested-by: Stephen Warren # Tegra Jetson TK1
Tested-by: Robert Jarzmik # mioa701_bootresume.S
Tested-by: Andrew Lunn # Kirkwood
Tested-by: Shawn Guo
Tested-by: Tony Lindgren # OMAPs
Tested-by: Gregory CLEMENT # Armada XP, 375, 385
Acked-by: Sekhar Nori # DaVinci
Acked-by: Christoffer Dall # kvm/hyp
Acked-by: Haojian Zhuang # PXA3xx
Acked-by: Stefano Stabellini # Xen
Tested-by: Uwe Kleine-König # ARMv7M
Tested-by: Simon Horman # Shmobile
Signed-off-by: Russell King

Russell King
2014-07-18 19:29:04 +0800

17 Jun, 2014

1 commit

5930c1a1f ARM: choose highest resolution delay timer ... Browse Code »

In case there are several possible delay timers, choose the one with the
highest resolution. This code relies on the fact secondary CPUs have not yet
been brought online when register_current_timer_delay() is called. This is
ensured by implementing calibration_delay_done(),

Signed-off-by: Peter De Schrijver
Acked-by: Russell King
Signed-off-by: Stephen Warren

Peter De Schrijver
2014-06-17 02:48:07 +0800

25 Feb, 2014

2 commits

d98b90ea2 ARM: 7990/1: asm: rename logical shift macros push pull into lspush lspull ... Browse Code »

Renames logical shift macros, 'push' and 'pull', defined in
arch/arm/include/asm/assembler.h, into 'lspush' and 'lspull'.
That eliminates name conflict between 'push' logical shift macro
and 'push' instruction mnemonic. That allows assembler.h to be
included in .S files that use 'push' instruction.

Suggested-by: Will Deacon
Signed-off-by: Victor Kamensky
Acked-by: Nicolas Pitre
Signed-off-by: Russell King

Victor Kamensky
2014-02-25 19:33:57 +0800
c32ffce0f ARM: 7984/1: prefetch: add prefetchw invocations for barriered atomics ... Browse Code »

After a bunch of benchmarking on the interaction between dmb and pldw,
it turns out that issuing the pldw *after* the dmb instruction can
give modest performance gains (~3% atomic_add_return improvement on a
dual A15).

This patch adds prefetchw invocations to our barriered atomic operations
including cmpxchg, test_and_xxx and futexes.

Signed-off-by: Will Deacon
Signed-off-by: Russell King

Will Deacon
2014-02-25 19:30:20 +0800

29 Dec, 2013

2 commits

017f161a5 ARM: 7877/1: use built-in byte swap function ... Browse Code »

Enable the compiler intrinsic for byte swapping on arch ARM. This
allows the compiler to detect and be able to optimize out byte
swappings, and has a very modest benefit on vmlinux size (Linaro gcc
4.8):

text data bss dec hex filename
2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig
2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap

6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig
6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap

7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig
7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap

Signed-off-by: Kim Phillips
Reviewed-by: Nicolas Pitre
Acked-by: David Woodhouse
Signed-off-by: Russell King

Kim Phillips
2013-12-29 20:32:45 +0800
ef41b5c92 ARM: make kernel oops easier to read ... Browse Code »

We don't need the offset for the first function name in each backtrace
entry; this needlessly consumes screen space. This is virtually always
the first or second instruction in the called function.

Also, recognise stmfd instructions which include r10 as a valid stack
saving instruction, and when dumping the registers, dump six registers
per line rather than five, and fix the wrapping.

Signed-off-by: Russell King

Russell King
2013-12-29 20:32:30 +0800

01 Dec, 2013

1 commit

11d4bb1bd ARM: 7907/1: lib: delay-loop: Add align directive to fix BogoMIPS calculation ... Browse Code »

Currently mx53 (CortexA8) running at 1GHz reports:
Calibrating delay loop... 663.55 BogoMIPS (lpj=3317760)

Tom Evans verified that alignments of 0x0 and 0x8 run the two instructions of __loop_delay in one clock cycle (1 clock/loop), while alignments of 0x4 and 0xc take 3 clocks to run the loop twice. (1.5 clock/loop)

The original object code looks like this:

00000010 :
10: e3e01000 mvn r1, #0
14: e51f201c ldr r2, [pc, #-28] ; 0
18: e5922000 ldr r2, [r2]
1c: e0800921 add r0, r0, r1, lsr #18
20: e1a00720 lsr r0, r0, #14
24: e0822b21 add r2, r2, r1, lsr #22
28: e1a02522 lsr r2, r2, #10
2c: e0000092 mul r0, r2, r0
30: e0800d21 add r0, r0, r1, lsr #26
34: e1b00320 lsrs r0, r0, #6
38: 01a0f00e moveq pc, lr

0000003c :
3c: e2500001 subs r0, r0, #1
40: 8afffffe bhi 3c
44: e1a0f00e mov pc, lr

After adding the 'align 3' directive to __loop_delay (align to 8 bytes):

00000010 :
10: e3e01000 mvn r1, #0
14: e51f201c ldr r2, [pc, #-28] ; 0
18: e5922000 ldr r2, [r2]
1c: e0800921 add r0, r0, r1, lsr #18
20: e1a00720 lsr r0, r0, #14
24: e0822b21 add r2, r2, r1, lsr #22
28: e1a02522 lsr r2, r2, #10
2c: e0000092 mul r0, r2, r0
30: e0800d21 add r0, r0, r1, lsr #26
34: e1b00320 lsrs r0, r0, #6
38: 01a0f00e moveq pc, lr
3c: e320f000 nop {0}

00000040 :
40: e2500001 subs r0, r0, #1
44: 8afffffe bhi 40
48: e1a0f00e mov pc, lr
4c: e320f000 nop {0}

, which now reports:
Calibrating delay loop... 996.14 BogoMIPS (lpj=4980736)

Some more test results:

On mx31 (ARM1136) running at 532 MHz, before the patch:
Calibrating delay loop... 351.43 BogoMIPS (lpj=1757184)

On mx31 (ARM1136) running at 532 MHz after the patch:
Calibrating delay loop... 528.79 BogoMIPS (lpj=2643968)

Also tested on mx6 (CortexA9) and on mx27 (ARM926), which shows the same
BogoMIPS value before and after this patch.

Reported-by: Tom Evans
Suggested-by: Tom Evans
Signed-off-by: Fabio Estevam
Signed-off-by: Russell King

Fabio Estevam
2013-12-01 06:21:03 +0800

21 Nov, 2013

1 commit

b7ec69940 ARM: 7893/1: bitops: only emit .arch_extension mp if CONFIG_SMP ... Browse Code »

Uwe reported a build failure when targetting a NOMMU platform with my
recent prefetch changes:

arch/arm/lib/changebit.S: Assembler messages:
arch/arm/lib/changebit.S:15: Error: architectural extension `mp' is
not allowed for the current base architecture

This is due to use of the .arch_extension mp directive immediately prior
to an ALT_SMP(...) instruction. Whilst the ALT_SMP macro will expand to
nothing if !CONFIG_SMP, gas will still choke on the directive.

This patch fixes the issue by only emitting the sequence (including the
directive) if CONFIG_SMP=y.

Tested-by: Uwe Kleine-König
Signed-off-by: Will Deacon
Signed-off-by: Russell King

Will Deacon
2013-11-21 07:05:53 +0800

14 Nov, 2013

1 commit

f47671e2d Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm ... Browse Code »

Pull ARM updates from Russell King:
"Included in this series are:

1. BE8 (modern big endian) changes for ARM from Ben Dooks
2. big.Little support from Nicolas Pitre and Dave Martin
3. support for LPAE systems with all system memory above 4GB
4. Perf updates from Will Deacon
5. Additional prefetching and other performance improvements from Will.
6. Neon-optimised AES implementation fro Ard.
7. A number of smaller fixes scattered around the place.

There is a rather horrid merge conflict in tools/perf - I was never
notified of the conflict because it originally occurred between Will's
tree and other stuff. Consequently I have a resolution which Will
forwarded me, which I'll forward on immediately after sending this
mail.

The other notable thing is I'm expecting some build breakage in the
crypto stuff on ARM only with Ard's AES patches. These were merged
into a stable git branch which others had already pulled, so there's
little I can do about this. The problem is caused because these
patches have a dependency on some code in the crypto git tree - I
tried requesting a branch I can pull to resolve these, and all I got
each time from the crypto people was "we'll revert our patches then"
which would only make things worse since I still don't have the
dependent patches. I've no idea what's going on there or how to
resolve that, and since I can't split these patches from the rest of
this pull request, I'm rather stuck with pushing this as-is or
reverting Ard's patches.

Since it should "come out in the wash" I've left them in - the only
build problems they seem to cause at the moment are with randconfigs,
and since it's a new feature anyway. However, if by -rc1 the
dependencies aren't in, I think it'd be best to revert Ard's patches"

I resolved the perf conflict roughly as per the patch sent by Russell,
but there may be some differences. Any errors are likely mine. Let's
see how the crypto issues work out..

* 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (110 commits)
ARM: 7868/1: arm/arm64: remove atomic_clear_mask() in "include/asm/atomic.h"
ARM: 7867/1: include: asm: use 'int' instead of 'unsigned long' for 'oldval' in atomic_cmpxchg().
ARM: 7866/1: include: asm: use 'long long' instead of 'u64' within atomic.h
ARM: 7871/1: amba: Extend number of IRQS
ARM: 7887/1: Don't smp_cross_call() on UP devices in arch_irq_work_raise()
ARM: 7872/1: Support arch_irq_work_raise() via self IPIs
ARM: 7880/1: Clear the IT state independent of the Thumb-2 mode
ARM: 7878/1: nommu: Implement dummy early_paging_init()
ARM: 7876/1: clear Thumb-2 IT state on exception handling
ARM: 7874/2: bL_switcher: Remove cpu_hotplug_driver_{lock,unlock}()
ARM: footbridge: fix build warnings for netwinder
ARM: 7873/1: vfp: clear vfp_current_hw_state for dying cpu
ARM: fix misplaced arch_virt_to_idmap()
ARM: 7848/1: mcpm: Implement cpu_kill() to synchronise on powerdown
ARM: 7847/1: mcpm: Factor out logical-to-physical CPU translation
ARM: 7869/1: remove unused XSCALE_PMU Kconfig param
ARM: 7864/1: Handle 64-bit memory in case of 32-bit phys_addr_t
ARM: 7863/1: Let arm_add_memory() always use 64-bit arguments
ARM: 7862/1: pcpu: replace __get_cpu_var_uses
ARM: 7861/1: cacheflush: consolidate single-CPU ARMv7 cache disabling code
...

Linus Torvalds
2013-11-14 07:51:29 +0800

12 Nov, 2013

1 commit

df762eccb Merge branch 'devel-stable' into for-next ... Browse Code »

Conflicts:
arch/arm/include/asm/atomic.h
arch/arm/include/asm/hardirq.h
arch/arm/kernel/smp.c

Russell King
2013-11-12 18:58:59 +0800

29 Oct, 2013

1 commit

a3a9ea656 ARM: 7858/1: mm: make UACCESS_WITH_MEMCPY huge page aware ... Browse Code »

The memory pinning code in uaccess_with_memcpy.c does not check
for HugeTLB or THP pmds, and will enter an infinite loop should
a __copy_to_user or __clear_user occur against a huge page.

This patch adds detection code for huge pages to pin_page_for_write.
As this code can be executed in a fast path it refers to the actual
pmds rather than the vma. If a HugeTLB or THP is found (they have
the same pmd representation on ARM), the page table spinlock is
taken to prevent modification whilst the page is pinned.

On ARM, huge pages are only represented as pmds, thus no huge pud
checks are performed. (For huge puds one would lock the page table
in a similar manner as in the pmd case).

Two helper functions are introduced; pmd_thp_or_huge will check
whether or not a page is huge or transparent huge (which have the
same pmd layout on ARM), and pmd_hugewillfault will detect whether
or not a page fault will occur on write to the page.

Running the following test (with the chunking from read_zero
removed):
$ dd if=/dev/zero of=/dev/null bs=10M count=1024
Gave: 2.3 GB/s backed by normal pages,
2.9 GB/s backed by huge pages,
5.1 GB/s backed by huge pages, with page mask=HPAGE_MASK.

After some discussion, it was decided not to adopt the HPAGE_MASK,
as this would have a significant detrimental effect on the overall
system latency due to page_table_lock being held for too long.
This could be revisited if split huge page locks are adopted.

Signed-off-by: Steve Capper
Reviewed-by: Nicolas Pitre
Signed-off-by: Russell King

Steven Capper
2013-10-29 19:06:15 +0800

30 Sep, 2013

1 commit

d779c07dd ARM: bitops: prefetch the destination word for write prior to strex ... Browse Code »

The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.

This patch prefixes our atomic bitops implementation with prefetchw,
to try and grab the line in exclusive state from the start. The testop
macro is left alone, since the barrier semantics limit the usefulness
of prefetching data.

Acked-by: Nicolas Pitre
Signed-off-by: Will Deacon

Will Deacon
2013-09-30 23:42:56 +0800

17 Sep, 2013

1 commit

136dfa5ed ARM: delete mach-shark ... Browse Code »

The Shark machine sub-architecture (also known as DNARD, the
DIGITAL Network Appliance Reference Design) lacks a maintainer
able to apply and test patches to modernize the architecture.

It is suspected that the current kernel, while it compiles,
does not even boot on this machine. The listed maintainer has
expressed that he will not be able to spend any time on the
maintenance for the coming year.

So let's delete it from the kernel for now. It can always be
resurrected with git revert if maintenance is resumed.

As the VIA82c505 PCI adapter was only used by this
architecture, that gets deleted too.

Cc: arm@kernel.org
Cc: Alexander Schulz
Signed-off-by: Linus Walleij

Linus Walleij
2013-09-17 18:34:36 +0800

09 Sep, 2013

1 commit

9319206d7 ARM: 7835/2: fix modular build of xor_blocks() with NEON enabled ... Browse Code »

Commit 0195659 introduced a NEON accelerated version of the xor_blocks()
function, but it needs the changes in this patch to allow it to be built
as a module rather than statically into the kernel.

This patch creates a separate module xor-neon.ko which exports the NEON
inner xor_blocks() functions depended upon by the regular xor.ko if it
is built with CONFIG_KERNEL_MODE_NEON=y

Reported-by: Josh Boyer
Signed-off-by: Ard Biesheuvel
Signed-off-by: Russell King

Ard Biesheuvel
2013-09-09 22:24:47 +0800

23 Jul, 2013

1 commit

b4f656eea Pull branch 'for-rmk' of git://git.linaro.org/people/ardbiesheuvel/linux-arm into devel-stable ... Browse Code »

Comments from Ard Biesheuvel:

I have included two use cases that I have been using, XOR and RAID-6
checksumming. The former gets a 60% performance boost on the NEON, the
latter over 400%.

ARM: add support for kernel mode NEON

Adds kernel_neon_begin/end (renamed from kernel_vfp_begin/end in the
previous version to de-emphasize the VFP part as VFP code that needs
software assistance is not supported currently.)

Introduces and the Kconfig symbol KERNEL_MODE_NEON. This
has been aligned with Catalin for arm64, so any NEON code that does
not use assembly but intrinsics or the GCC vectorizer (such as my
examples) can potentially be shared between arm and arm64 archs.

ARM: move VFP init to an earlier boot stage

This is needed so the NEON is enabled when the XOR and RAID-6 algo
boot time benchmarks are run.

ARM: be strict about FP exceptions in kernel mode

This adds a check to vfp_support_entry() to flag unsupported uses of
the NEON/VFP in kernel mode. FP exceptions (bounces) are flagged as
a bug, this is because of their potentially intermittent nature.
Exceptions caused by the fact that kernel_neon_begin has not been
called are just routed through the undef handler.

ARM: crypto: add NEON accelerated XOR implementation

This is the xor_blocks() implementation built with -ftree-vectorize,
60% faster than optimized ARM code. It calls in_interrupt() to check
whether the NEON flavor can be used: this should really not be
necessary, but due to xor_blocks'squite generic nature, there is no
telling how exactly people may be using it in the real world.

lib/raid6: add ARM-NEON accelerated syndrome calculation

This is a port of the RAID-6 checksumming code in altivec.uc ported
to use NEON intrinsics. It is about 4x faster than the sequential
code.

Russell King
2013-07-23 00:46:40 +0800

15 Jul, 2013

1 commit

8bd26e3a7 arm: delete __cpuinit/__CPUINIT usage from all ARM users ... Browse Code »

The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
and are flagged as __cpuinit -- so if we remove the __cpuinit from
the arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
related content into no-ops as early as possible, since that will get
rid of these warnings. In any case, they are temporary and harmless.

This removes all the ARM uses of the __cpuinit macros from C code,
and all __CPUINIT from assembly code. It also had two ".previous"
section statements that were paired off against __CPUINIT
(aka .section ".cpuinit.text") that also get removed here.

[1] https://lkml.org/lkml/2013/5/20/589

Cc: Russell King
Cc: Will Deacon
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Paul Gortmaker

Paul Gortmaker
2013-07-15 07:36:52 +0800

09 Jul, 2013

1 commit

01956597c ARM: crypto: add NEON accelerated XOR implementation ... Browse Code »

Add a source file xor-neon.c (which is really just the reference
C implementation passed through the GCC vectorizer) and hook it
up to the XOR framework.

Signed-off-by: Ard Biesheuvel
Acked-by: Nicolas Pitre

Ard Biesheuvel
2013-07-09 05:09:06 +0800

03 Apr, 2013

1 commit

6f3d90e55 ARM: 7685/1: delay: use private ticks_per_jiffy field for timer-based delay ops ... Browse Code »

Commit 70264367a243 ("ARM: 7653/2: do not scale loops_per_jiffy when
using a constant delay clock") fixed a problem with our timer-based
delay loop, where loops_per_jiffy is scaled by cpufreq yet used directly
by the timer delay ops.

This patch fixes the problem in a more elegant way by keeping a private
ticks_per_jiffy field in the delay ops, independent of loops_per_jiffy
and therefore not subject to scaling. The loop-based delay continues to
use loops_per_jiffy directly, as it should.

Acked-by: Nicolas Pitre
Signed-off-by: Will Deacon
Signed-off-by: Russell King

Will Deacon
2013-04-03 23:45:50 +0800

12 Mar, 2013

1 commit

418df63ad ARM: 7670/1: fix the memset fix ... Browse Code »
1

Commit 455bd4c430b0 ("ARM: 7668/1: fix memset-related crashes caused by
recent GCC (4.7.2) optimizations") attempted to fix a compliance issue
with the memset return value. However the memset itself became broken
by that patch for misaligned pointers.

This fixes the above by branching over the entry code from the
misaligned fixup code to avoid reloading the original pointer.

Also, because the function entry alignment is wrong in the Thumb mode
compilation, that fixup code is moved to the end.

While at it, the entry instructions are slightly reworked to help dual
issue pipelines.

Signed-off-by: Nicolas Pitre
Tested-by: Alexander Holler
Signed-off-by: Russell King

Nicolas Pitre
2013-03-12 20:18:47 +0800

08 Mar, 2013

1 commit

455bd4c43 ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimizations ... Browse Code »
2

Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
assumptions about the implementation of memset and similar functions.
The current ARM optimized memset code does not return the value of
its first argument, as is usually expected from standard implementations.

For instance in the following function:

void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
{
memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
waiter->magic = waiter;
INIT_LIST_HEAD(&waiter->list);
}

compiled as:

800554d0 :
800554d0: e92d4008 push {r3, lr}
800554d4: e1a00001 mov r0, r1
800554d8: e3a02010 mov r2, #16 ; 0x10
800554dc: e3a01011 mov r1, #17 ; 0x11
800554e0: eb04426e bl 80165ea0
800554e4: e1a03000 mov r3, r0
800554e8: e583000c str r0, [r3, #12]
800554ec: e5830000 str r0, [r3]
800554f0: e5830004 str r0, [r3, #4]
800554f4: e8bd8008 pop {r3, pc}

GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
register/memory corruptions.

This patch fixes the return value of the assembly version of memset.
It adds a 'mov' instruction and merges an additional load+store into
existing load/store instructions.
For ease of review, here is a breakdown of the patch into 4 simple steps:

Step 1
======
Perform the following substitutions:
ip -> r8, then
r0 -> ip,
and insert 'mov ip, r0' as the first statement of the function.
At this point, we have a memset() implementation returning the proper result,
but corrupting r8 on some paths (the ones that were using ip).

Step 2
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:

save r8:
- str lr, [sp, #-4]!
+ stmfd sp!, {r8, lr}

and restore r8 on both exit paths:
- ldmeqfd sp!, {pc} @ Now
Reviewed-by: Nicolas Pitre
Signed-off-by: Dirk Behme
Signed-off-by: Russell King

Ivan Djelic
2013-03-08 00:14:22 +0800

21 Feb, 2013

1 commit

70264367a ARM: 7653/2: do not scale loops_per_jiffy when using a constant delay clock ... Browse Code »

When udelay() is implemented using an architected timer, it is wrong
to scale loops_per_jiffy when changing the CPU clock frequency since
the timer clock remains constant.

The lpj should probably become an implementation detail relevant to
the CPU loop based delay routine only and more confined to it. In the
mean time this is the minimal fix needed to have expected delays with
the timer based implementation when cpufreq is also in use.

Reported-by: Viresh Kumar
Signed-off-by: Nicolas Pitre
Tested-by: Viresh Kumar
Acked-by: Liviu Dudau
Cc: stable@vger.kernel.org
Signed-off-by: Russell King

Nicolas Pitre
2013-02-21 21:25:36 +0800

10 Oct, 2012

1 commit

f3accb122 ARM: export default read_current_timer ... Browse Code »

read_current_timer is used by get_cycles since "ARM: 7538/1: delay:
add registration mechanism for delay timer sources", and get_cycles
can be used by device drivers in loadable modules, so it has to
be exported.

Without this patch, building imote2_defconfig fails with

ERROR: "read_current_timer" [crypto/tcrypt.ko] undefined!

Signed-off-by: Arnd Bergmann
Cc: Stephen Boyd
Cc: Jonathan Austin
Cc: Will Deacon
Cc: Russell King

Arnd Bergmann
2012-10-10 02:24:36 +0800

05 Oct, 2012

1 commit

ceaa1a13c Merge branch 'arch-timers' into for-linus ... Browse Code »

Conflicts:
arch/arm/include/asm/timex.h
arch/arm/lib/delay.c

Russell King
2012-10-05 06:02:26 +0800

27 Sep, 2012

1 commit

56942fec0 ARM: 7538/1: delay: add registration mechanism for delay timer sources ... Browse Code »

The current timer-based delay loop relies on the architected timer to
initiate the switch away from the polling-based implementation. This is
unfortunate for platforms without the architected timers but with a
suitable delay source (that is, constant frequency, always powered-up
and ticking as long as the CPUs are online).

This patch introduces a registration mechanism for the delay timer
(which provides an unconditional read_current_timer implementation) and
updates the architected timer code to use the new interface.

Reviewed-by: Stephen Boyd
Signed-off-by: Jonathan Austin
Signed-off-by: Will Deacon
Signed-off-by: Russell King

Jonathan Austin
2012-09-27 05:57:52 +0800

10 Sep, 2012

2 commits

beafa0de3 ARM: 7529/1: delay: set loops_per_jiffy when moving to timer-based loop ... Browse Code »

The delay functions may be called by some platforms between switching to
the timer-based delay loop but before calibration. In this case, the
initial loops_per_jiffy may not be suitable for the timer (although a
compromise may be achievable) and delay times may be considered too
inaccurate.

This patch updates loops_per_jiffy when switching to the timer-based
delay loop so that delays are consistent prior to calibration.

Signed-off-by: Will Deacon
Signed-off-by: Russell King

Will Deacon
2012-09-10 00:28:48 +0800
8404663f8 ARM: 7527/1: uaccess: explicitly check __user pointer when !CPU_USE_DOMAINS ... Browse Code »

The {get,put}_user macros don't perform range checking on the provided
__user address when !CPU_HAS_DOMAINS.

This patch reworks the out-of-line assembly accessors to check the user
address against a specified limit, returning -EFAULT if is is out of
range.

[will: changed get_user register allocation to match put_user]
[rmk: fixed building on older ARM architectures]

Reported-by: Catalin Marinas
Signed-off-by: Will Deacon
Cc: stable@vger.kernel.org
Signed-off-by: Russell King

Russell King
2012-09-10 00:28:47 +0800

13 Aug, 2012

1 commit

080fc66fb ARM: Bring back ARMv3 IO and user access code ... Browse Code »

This partially reverts 357c9c1f07d4546bc3fbc0fd1044d96b114d14ed
(ARM: Remove support for ARMv3 ARM610 and ARM710 CPUs).

Although we only support StrongARM on the RiscPC, we need to keep the
ARMv3 user access code for this platform because the bus does not
understand half-word load/stores.

Reported-by: Arnd Bergmann
Signed-off-by: Russell King

Russell King
2012-08-13 18:44:13 +0800

31 Jul, 2012

1 commit

0cc41e4a2 arch: remove direct definitions of KERN_<LEVEL> uses ... Browse Code »

Add #include so that the #define KERN_ macros
don't have to be duplicated.

Signed-off-by: Joe Perches
Cc: Kay Sievers
Cc: Russell King
Cc: Kay Sievers
Acked-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2012-07-31 08:25:13 +0800

28 Jul, 2012

1 commit

91b006def Merge branches 'audit', 'delay', 'fixes', 'misc' and 'sta2x11' into for-linus Browse Code »

Russell King
2012-07-28 06:06:32 +0800

10 Jul, 2012

1 commit

d0a533b18 ARM: 7452/1: delay: allow timer-based delay implementation to be selected ... Browse Code »

This patch allows a timer-based delay implementation to be selected by
switching the delay routines over to use get_cycles, which is
implemented in terms of read_current_timer. This further allows us to
skip the loop calibration and have a consistent delay function in the
face of core frequency scaling.

To avoid the pain of dealing with memory-mapped counters, this
implementation uses the co-processor interface to the architected timers
when they are available. The previous loop-based implementation is
kept around for CPUs without the architected timers and we retain both
the maximum delay (2ms) and the corresponding conversion factors for
determining the number of loops required for a given interval. Since the
indirection of the timer routines will only work when called from C,
the sa1100 sleep routines are modified to branch to the loop-based delay
functions directly.

Tested-by: Shinya Kuribayashi
Reviewed-by: Stephen Boyd
Signed-off-by: Will Deacon
Signed-off-by: Russell King

Will Deacon
2012-07-10 00:42:23 +0800