Eric Lee / smarc-fsl-linux-kernel

04 Jul, 2006

1 commit

a875a69f8 [PATCH] lockdep: add per_cpu_offset() ... Browse Code »

Add the per_cpu_offset() generic method. (used by the lock validator)

Signed-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2006-07-04 06:27:00 +0800

26 Jun, 2006

1 commit

bfe5d8341 [PATCH] Define __raw_get_cpu_var and use it ... Browse Code »

There are several instances of per_cpu(foo, raw_smp_processor_id()), which
is semantically equivalent to __get_cpu_var(foo) but without the warning
that smp_processor_id() can give if CONFIG_DEBUG_PREEMPT is enabled. For
those architectures with optimized per-cpu implementations, namely ia64,
powerpc, s390, sparc64 and x86_64, per_cpu() turns into more and slower
code than __get_cpu_var(), so it would be preferable to use __get_cpu_var
on those platforms.

This defines a __raw_get_cpu_var(x) macro which turns into per_cpu(x,
raw_smp_processor_id()) on architectures that use the generic per-cpu
implementation, and turns into __get_cpu_var(x) on the architectures that
have an optimized per-cpu implementation.

Signed-off-by: Paul Mackerras
Acked-by: David S. Miller
Acked-by: Ingo Molnar
Acked-by: Martin Schwidefsky
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Mackerras
2006-06-26 01:01:01 +0800

29 Mar, 2006

1 commit

0e5519548 [PATCH] for_each_possible_cpu: powerpc ... Browse Code »

for_each_cpu() actually iterates across all possible CPUs. We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs. This is inefficient and
possibly buggy.

We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.

This patch replaces for_each_cpu with for_each_possible_cpu.

Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Paul Mackerras

KAMEZAWA Hiroyuki
2006-03-29 10:44:15 +0800

23 Mar, 2006

1 commit

394e3902c [PATCH] more for_each_cpu() conversions ... Browse Code »

When we stop allocating percpu memory for not-possible CPUs we must not touch
the percpu data for not-possible CPUs at all. The correct way of doing this
is to test cpu_possible() or to use for_each_cpu().

This patch is a kernel-wide sweep of all instances of NR_CPUS. I found very
few instances of this bug, if any. But the patch converts lots of open-coded
test to use the preferred helper macros.

Cc: Mikael Starvik
Cc: David Howells
Acked-by: Kyle McMartin
Cc: Anton Blanchard
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Paul Mundt
Cc: "David S. Miller"
Cc: William Lee Irwin III
Cc: Andi Kleen
Cc: Christian Zankel
Cc: Philippe Elie
Cc: Nathan Scott
Cc: Jens Axboe
Cc: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-03-23 23:38:17 +0800

11 Jan, 2006

1 commit

7a0268fa1 [PATCH] powerpc/64: per cpu data optimisations ... Browse Code »

The current ppc64 per cpu data implementation is quite slow. eg:

lhz 11,18(13) /* smp_processor_id() */
ld 9,.LC63-.LCTOC1(30) /* per_cpu__variable_name */
ld 8,.LC61-.LCTOC1(30) /* __per_cpu_offset */
sldi 11,11,3 /* form index into __per_cpu_offset */
mr 10,9
ldx 9,11,8 /* __per_cpu_offset[smp_processor_id()] */
ldx 0,10,9 /* load per cpu data */

5 loads for something that is supposed to be fast, pretty awful. One
reason for the large number of loads is that we have to synthesize 2
64bit constants (per_cpu__variable_name and __per_cpu_offset).

By putting __per_cpu_offset into the paca we can avoid the 2 loads
associated with it:

ld 11,56(13) /* paca->data_offset */
ld 9,.LC59-.LCTOC1(30) /* per_cpu__variable_name */
ldx 0,9,11 /* load per cpu data

Longer term we can should be able to do even better than 3 loads.
If per_cpu__variable_name wasnt a 64bit constant and paca->data_offset
was in a register we could cut it down to one load. A suggestion from
Rusty is to use gcc's __thread extension here. In order to do this we
would need to free up r13 (the __thread register and where the paca
currently is). So far Ive had a few unsuccessful attempts at doing that :)

The patch also allocates per cpu memory node local on NUMA machines.
This patch from Rusty has been sitting in my queue _forever_ but stalled
when I hit the compiler bug. Sorry about that.

Finally I also only allocate per cpu data for possible cpus, which comes
straight out of the x86-64 port. On a pseries kernel (with NR_CPUS == 128)
and 4 possible cpus we see some nice gains:

total used free shared buffers cached
Mem: 4012228 212860 3799368 0 0 162424

total used free shared buffers cached
Mem: 4016200 212984 3803216 0 0 162424

A saving of 3.75MB. Quite nice for smaller machines. Note: we now have
to be careful of per cpu users that touch data for !possible cpus.

At this stage it might be worth making the NUMA and possible cpu
optimisations generic, but per cpu init is done so early we have to be
careful that all architectures have their possible map setup correctly.

Signed-off-by: Anton Blanchard
Signed-off-by: Paul Mackerras

Anton Blanchard
2006-01-11 11:49:45 +0800

30 Aug, 2005

1 commit

6f9aa7274 [PATCH] Move all the very similar files to asm-powerpc ... Browse Code »

They differed in either simple comments or in the protecting ifdefs.

Signed-off-by: Stephen Rothwell
Signed-off-by: Paul Mackerras

Stephen Rothwell
2005-08-30 11:32:06 +0800