Doug / smarc-fsl-linux-kernel | Embedian Git Server

10 Jul, 2012

2 commits

86e32fdce powerpc: Change mtcrf to use real register names ... Browse Code »

mtocrf define is just a wrapper around the real instructions so we can
just use real register names here (ie. lower case).

Also remove braces in macro so this is possible.

Signed-off-by: Michael Neuling
Signed-off-by: Benjamin Herrenschmidt

Michael Neuling
2012-07-10 17:18:11 +0800
c75df6f96 powerpc: Fix usage of register macros getting ready for %r0 change ... Browse Code »

Anything that uses a constructed instruction (ie. from ppc-opcode.h),
need to use the new R0 macro, as %r0 is not going to work.

Also convert usages of macros where we are just determining an offset
(usually for a load/store), like:
std r14,STK_REG(r14)(r1)
Can't use STK_REG(r14) as %r14 doesn't work in the STK_REG macro since
it's just calculating an offset.

Signed-off-by: Michael Neuling
Signed-off-by: Benjamin Herrenschmidt

Michael Neuling
2012-07-10 17:17:55 +0800

03 Jul, 2012

1 commit

b3f271e86 powerpc: POWER7 optimised memcpy using VMX and enhanced prefetch ... Browse Code »

Implement a POWER7 optimised memcpy using VMX and enhanced prefetch
instructions.

This is a copy of the POWER7 optimised copy_to_user/copy_from_user
loop. Detailed implementation and performance details can be found in
commit a66086b8197d (powerpc: POWER7 optimised
copy_to_user/copy_from_user using VMX).

I noticed memcpy issues when profiling a RAID6 workload:

.memcpy
.async_memcpy
.async_copy_data
.__raid_run_ops
.handle_stripe
.raid5d
.md_thread

I created a simplified testcase by building a RAID6 array with 4 1GB
ramdisks (booting with brd.rd_size=1048576):

# mdadm -CR -e 1.2 /dev/md0 --level=6 -n4 /dev/ram[0-3]

I then timed how long it took to write to the entire array:

# dd if=/dev/zero of=/dev/md0 bs=1M

Before: 892 MB/s
After: 999 MB/s

A 12% improvement.

Signed-off-by: Anton Blanchard
Signed-off-by: Benjamin Herrenschmidt

Anton Blanchard
2012-07-03 12:14:46 +0800

30 Apr, 2012

1 commit

694caf025 powerpc: Remove CONFIG_POWER4_ONLY ... Browse Code »

Remove CONFIG_POWER4_ONLY, the option is badly named and only does two
things:

- It wraps the MMU segment table code. With feature fixups there is
little downside to compiling this in.

- It uses the newer mtocrf instruction in various assembly functions.
Instead of making this a compile option just do it at runtime via
a feature fixup.

Signed-off-by: Anton Blanchard
Signed-off-by: Benjamin Herrenschmidt

Anton Blanchard
2012-04-30 13:37:26 +0800

26 Feb, 2009

1 commit

e423b9ecd powerpc: Fix 64bit memcpy() regression ... Browse Code »

This fixes a regression introduced by commit
25d6e2d7c58ddc4a3b614fc5381591c0cfe66556 ("powerpc: Update 64bit memcpy()
using CPU_FTR_UNALIGNED_LD_STD").

This commit allowed CPUs that have the CPU_FTR_UNALIGNED_LD_STD CPU
feature bit present to do the memcpy() with unaligned load doubles. But,
along with this came a bug where our final load double would read bytes
beyond a page boundary and into the next (unmapped) page. This was caught
by enabling CONFIG_DEBUG_PAGEALLOC,

The fix was to read only the number of bytes that we need to store rather
than reading a full 8-byte doubleword and storing only a portion of that.

In order to minimise the amount of existing code touched we use the
original do_tail for the src_unaligned case.

Below is an example of the regression, as reported by Sachin Sant:

Unable to handle kernel paging request for data at address 0xc00000003f380000
Faulting instruction address: 0xc000000000039574
cpu 0x1: Vector: 300 (Data Access) at [c00000003baf3020]
pc: c000000000039574: .memcpy+0x74/0x244
lr: d00000000244916c: .ext3_xattr_get+0x288/0x2f4 [ext3]
sp: c00000003baf32a0
msr: 8000000000009032
dar: c00000003f380000
dsisr: 40000000
current = 0xc00000003e54b010
paca = 0xc000000000a53680
pid = 1840, comm = readahead
enter ? for help
[link register ] d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
[c00000003baf32a0] d000000002449104 .ext3_xattr_get+0x220/0x2f4 [ext3]
(unreliab
le)
[c00000003baf3390] d00000000244a6e8 .ext3_xattr_security_get+0x40/0x5c [ext3]
[c00000003baf3400] c000000000148154 .generic_getxattr+0x74/0x9c
[c00000003baf34a0] c000000000333400 .inode_doinit_with_dentry+0x1c4/0x678
[c00000003baf3560] c00000000032c6b0 .security_d_instantiate+0x50/0x68
[c00000003baf35e0] c00000000013c818 .d_instantiate+0x78/0x9c
[c00000003baf3680] c00000000013ced0 .d_splice_alias+0xf0/0x120
[c00000003baf3720] d00000000243e05c .ext3_lookup+0xec/0x134 [ext3]
[c00000003baf37c0] c000000000131e74 .do_lookup+0x110/0x260
[c00000003baf3880] c000000000134ed0 .__link_path_walk+0xa98/0x1010
[c00000003baf3970] c0000000001354a0 .path_walk+0x58/0xc4
[c00000003baf3a20] c000000000135720 .do_path_lookup+0x138/0x1e4
[c00000003baf3ad0] c00000000013645c .path_lookup_open+0x6c/0xc8
[c00000003baf3b70] c000000000136780 .do_filp_open+0xcc/0x874
[c00000003baf3d10] c0000000001251e0 .do_sys_open+0x80/0x140
[c00000003baf3dc0] c00000000016aaec .compat_sys_open+0x24/0x38
[c00000003baf3e30] c00000000000855c syscall_exit+0x0/0x40

Signed-off-by: Benjamin Herrenschmidt

Mark Nelson
2009-02-26 11:02:53 +0800

05 Nov, 2008

1 commit

25d6e2d7c powerpc: Update 64bit memcpy() using CPU_FTR_UNALIGNED_LD_STD ... Browse Code »

Update memcpy() to add two new feature sections: one for aligning the
destination before copying and one for copying using aligned load
and store doubles.

These new feature sections will only affect Power6 and Cell because
the CPU feature bit was only added to these two processors.

Power6 gets its best performance in memcpy() when aligning neither the
source nor the destination, while Cell gets its best performance when
just the destination is aligned. But in order to save on CPU feature
bits we can use the previously added CPU_FTR_CP_USE_DCBTZ feature bit
to differentiate between Power6 and Cell (because CPU_FTR_CP_USE_DCBTZ
was added to Cell but not Power6).

The first feature section acts to nop out the branch that takes us to
the code that aligns us to an eight byte boundary for the destination.
We only want to nop out this branch on Power6.

So the ALT_FTR_SECTION_END() for this feature section creates a test
mask of the two feature bits ORed together and provides an expected
result of just CPU_FTR_UNALIGNED_LD_STD, thus we nop out the branch
if we're on a CPU that has CPU_FTR_UNALIGNED_LD_STD set and
CPU_FTR_CP_USE_DCBTZ unset.

For the second feature section added, if we're on a CPU that has the
CPU_FTR_UNALIGNED_LD_STD bit set then we don't want to do the copy
with aligned loads and stores (and the appropriate shifting left and
right instructions), so we want to nop out the branch to
.Lsrc_unaligned.

The andi. used for this branch is moved to just above the branch
because this allows us to nop out both instructions with just one
feature section which gives us better performance and doesn't hurt
readability which two separate feature sections did.

Moving the andi. to just above the branch doesn't have any noticeable
negative effect on the remaining 64bit processors (the ones that
didn't have this feature bit added).

On Cell this simple modification results in an improvement to measured
memcpy() bandwidth of up to 50% in the hot cache case and up to 15% in
the cold cache case.

On Power6 we get memory bandwidth results that are up to three times
faster in the hot cache case and up to 50% faster in the cold cache
case.

Commit 2a9294369bd020db89bfdf78b84c3615b39a5c84 ("powerpc: Add new CPU
feature: CPU_FTR_CP_USE_DCBTZ") was where CPU_FTR_CP_USE_DCBTZ was
added.

To say that Cell gets its best performance in memcpy() with just the
destination aligned is true but only for the reason that the indirect
shift and rotate instructions, sld and srd, are microcoded on Cell.
This means that either the destination or the source can be aligned,
but not both, and seeing as we get better performance with the
destination aligned we choose this option.

While we're at it make a one line change from cmpldi r1,... to
cmpldi cr1,... for consistency.

Signed-off-by: Mark Nelson
Signed-off-by: Paul Mackerras

Mark Nelson
2008-11-05 19:08:29 +0800

13 Apr, 2007

1 commit

3467bfd34 [POWERPC] Use mtocrf instruction in asm when CONFIG_POWER4_ONLY=y ... Browse Code »

mtocrf is a faster single-field mtcrf (move to condition register
fields) instruction available in POWER4 and later processors. It can
make quite a difference in performance on some implementations, so use
it for CONFIG_POWER4_ONLY builds.

Signed-off-by: Olof Johansson
Signed-off-by: Paul Mackerras

Olof Johansson
2007-04-13 01:55:13 +0800

31 Aug, 2006

1 commit

d0027bf09 [POWERPC] Fix return value from memcpy ... Browse Code »

As pointed out by Herbert Xu , our
memcpy implementation didn't return the destination pointer as its
return value, and there is code in the kernel that expects that.
This fixes it.

Signed-off-by: Paul Mackerras

Paul Mackerras
2006-08-31 11:22:58 +0800

10 Feb, 2006

1 commit

2ef9481e6 [PATCH] powerpc: trivial: modify comments to refer to new location of files ... Browse Code »

This patch removes all self references and fixes references to files
in the now defunct arch/ppc64 tree. I think this accomplises
everything wanted, though there might be a few references I missed.

Signed-off-by: Jon Mason
Signed-off-by: Paul Mackerras

Jon Mason
2006-02-10 13:53:51 +0800

10 Oct, 2005

1 commit

70d64ceaa powerpc: Rename files to have consistent _32/_64 suffixes ... Browse Code »

This doesn't change any code, just renames things so we consistently
have foo_32.c and foo_64.c where we have separate 32- and 64-bit
versions.

Signed-off-by: Paul Mackerras

Paul Mackerras
2005-10-10 19:52:43 +0800