Eric Lee / smarc-fsl-linux-kernel

10 Sep, 2010

2 commits

677243d74 percpu: Optimize __get_cpu_var() ... Browse Code »

Redefine __get_cpu_var() using this_cpu_ptr() which can be
arch-optimized.

Signed-off-by: Brian Gerst
Signed-off-by: Tejun Heo

Brian Gerst
2010-09-10 16:56:51 +0800
db7829c6c x86, percpu: Optimize this_cpu_ptr ... Browse Code »

Allow arches to implement __this_cpu_ptr, and provide an x86 version.

Before:
movq $foo, %rax
movq %gs:this_cpu_off, %rdx
addq %rdx, %rax

After:
movq $foo, %rax
addq %gs:this_cpu_off, %rax

The benefit is doing it in one less instruction and not clobbering
a temporary register.

tj: * Beefed up the comment a bit and renamed in-macro temp variable
to match neighboring macros.

* Folded fix for const pointer case found in linux-next.

* Fixed sparse notation.

Signed-off-by: Brian Gerst
Signed-off-by: Tejun Heo

Brian Gerst
2010-09-10 16:56:47 +0800

07 Aug, 2010

1 commit

18cb2aef9 percpu: handle __percpu notations in UP accessors ... Browse Code »

UP accessors didn't take care of __percpu notations leading to a lot
of spurious sparse warnings on UP configurations. Fix it.

Signed-off-by: Namhyung Kim
Signed-off-by: Tejun Heo

Namhyung Kim
2010-08-07 20:20:53 +0800

01 Jun, 2010

1 commit

1f7389786 Merge branch 'for-35' of git://repo.or.cz/linux-kbuild ... Browse Code »

* 'for-35' of git://repo.or.cz/linux-kbuild: (81 commits)
kbuild: Revert part of e8d400a to resolve a conflict
kbuild: Fix checking of scm-identifier variable
gconfig: add support to show hidden options that have prompts
menuconfig: add support to show hidden options which have prompts
gconfig: remove show_debug option
gconfig: remove dbg_print_ptype() and dbg_print_stype()
kconfig: fix zconfdump()
kconfig: some small fixes
add random binaries to .gitignore
kbuild: Include gen_initramfs_list.sh and the file list in the .d file
kconfig: recalc symbol value before showing search results
.gitignore: ignore *.lzo files
headerdep: perlcritic warning
scripts/Makefile.lib: Align the output of LZO
kbuild: Generate modules.builtin in make modules_install
Revert "kbuild: specify absolute paths for cscope"
kbuild: Do not unnecessarily regenerate modules.builtin
headers_install: use local file handles
headers_check: fix perl warnings
export_report: fix perl warnings
...

Linus Torvalds
2010-06-01 23:55:52 +0800

03 Mar, 2010

1 commit

3d9a854c2 Rename .data[.percpu][.XXX] to .data[..percpu][..XXX]. ... Browse Code »

Signed-off-by: Denys Vlasenko
Signed-off-by: Michal Marek

Denys Vlasenko
2010-03-03 18:26:00 +0800

29 Oct, 2009

3 commits

545695fb4 percpu: make accessors check for percpu pointer in sparse ... Browse Code »

The previous patch made sparse warn about percpu variables being used
directly without going through percpu accessors. This patch
implements the other half - checking whether non percpu variable is
passed into percpu accessors.

Signed-off-by: Tejun Heo
Cc: Rusty Russell
Cc: Al Viro

Tejun Heo
2009-10-29 21:34:15 +0800
e0fdb0e05 percpu: add __percpu for sparse. ... Browse Code »

We have to make __kernel "__attribute__((address_space(0)))" so we can
cast to it.

tj: * put_cpu_var() update.

* Annotations added to dynamic allocator interface.

Signed-off-by: Rusty Russell
Cc: Al Viro
Signed-off-by: Tejun Heo

Rusty Russell
2009-10-29 21:34:15 +0800
dd17c8f72 percpu: remove per_cpu__ prefix. ... Browse Code »

Now that the return from alloc_percpu is compatible with the address
of per-cpu vars, it makes sense to hand around the address of per-cpu
variables. To make this sane, we remove the per_cpu__ prefix we used
created to stop people accidentally using these vars directly.

Now we have sparse, we can use that (next patch).

tj: * Updated to convert stuff which were missed by or added after the
original patch.

* Kill per_cpu_var() macro.

Signed-off-by: Rusty Russell
Signed-off-by: Tejun Heo
Reviewed-by: Christoph Lameter

Rusty Russell
2009-10-29 21:34:15 +0800

03 Oct, 2009

1 commit

7340a0b15 this_cpu: Introduce this_cpu_ptr() and generic this_cpu_* operations ... Browse Code »

This patch introduces two things: First this_cpu_ptr and then per cpu
atomic operations.

this_cpu_ptr
------------

A common operation when dealing with cpu data is to get the instance of the
cpu data associated with the currently executing processor. This can be
optimized by

this_cpu_ptr(xx) = per_cpu_ptr(xx, smp_processor_id).

The problem with per_cpu_ptr(x, smp_processor_id) is that it requires
an array lookup to find the offset for the cpu. Processors typically
have the offset for the current cpu area in some kind of (arch dependent)
efficiently accessible register or memory location.

We can use that instead of doing the array lookup to speed up the
determination of the address of the percpu variable. This is particularly
significant because these lookups occur in performance critical paths
of the core kernel. this_cpu_ptr() can avoid memory accesses and

this_cpu_ptr comes in two flavors. The preemption context matters since we
are referring the the currently executing processor. In many cases we must
insure that the processor does not change while a code segment is executed.

__this_cpu_ptr -> Do not check for preemption context
this_cpu_ptr -> Check preemption context

The parameter to these operations is a per cpu pointer. This can be the
address of a statically defined per cpu variable (&per_cpu_var(xxx)) or
the address of a per cpu variable allocated with the per cpu allocator.

per cpu atomic operations: this_cpu_*(var, val)
-----------------------------------------------
this_cpu_* operations (like this_cpu_add(struct->y, value) operate on
abitrary scalars that are members of structures allocated with the new
per cpu allocator. They can also operate on static per_cpu variables
if they are passed to per_cpu_var() (See patch to use this_cpu_*
operations for vm statistics).

These operations are guaranteed to be atomic vs preemption when modifying
the scalar. The calculation of the per cpu offset is also guaranteed to
be atomic at the same time. This means that a this_cpu_* operation can be
safely used to modify a per cpu variable in a context where interrupts are
enabled and preemption is allowed. Many architectures can perform such
a per cpu atomic operation with a single instruction.

Note that the atomicity here is different from regular atomic operations.
Atomicity is only guaranteed for data accessed from the currently executing
processor. Modifications from other processors are still possible. There
must be other guarantees that the per cpu data is not modified from another
processor when using these instruction. The per cpu atomicity is created
by the fact that the processor either executes and instruction or not.
Embedded in the instruction is the relocation of the per cpu address to
the are reserved for the current processor and the RMW action. Therefore
interrupts or preemption cannot occur in the mids of this processing.

Generic fallback functions are used if an arch does not define optimized
this_cpu operations. The functions come also come in the two flavors used
for this_cpu_ptr().

The firstparameter is a scalar that is a member of a structure allocated
through allocpercpu or a per cpu variable (use per_cpu_var(xxx)). The
operations are similar to what percpu_add() and friends do.

this_cpu_read(scalar)
this_cpu_write(scalar, value)
this_cpu_add(scale, value)
this_cpu_sub(scalar, value)
this_cpu_inc(scalar)
this_cpu_dec(scalar)
this_cpu_and(scalar, value)
this_cpu_or(scalar, value)
this_cpu_xor(scalar, value)

Arch code can override the generic functions and provide optimized atomic
per cpu operations. These atomic operations must provide both the relocation
(x86 does it through a segment override) and the operation on the data in a
single instruction. Otherwise preempt needs to be disabled and there is no
gain from providing arch implementations.

A third variant is provided prefixed by irqsafe_. These variants are safe
against hardware interrupts on the *same* processor (all per cpu atomic
primitives are *always* *only* providing safety for code running on the
*same* processor!). The increment needs to be implemented by the hardware
in such a way that it is a single RMW instruction that is either processed
before or after an interrupt.

cc: David Howells
cc: Ingo Molnar
cc: Rusty Russell
cc: Eric Dumazet
Signed-off-by: Christoph Lameter
Signed-off-by: Tejun Heo

Christoph Lameter
2009-10-03 18:48:22 +0800

04 Sep, 2009

1 commit

53f824520 x86/i386: Put aligned stack-canary in percpu shared_aligned section ... Browse Code »

Pack aligned things together into a special section to minimize
padding holes.

Suggested-by: Eric Dumazet
Signed-off-by: Jeremy Fitzhardinge
Cc: Tejun Heo
LKML-Reference:
[ queued up in tip:x86/asm because it depends on this commit:
x86/i386: Make sure stack-protector segment base is cache aligned ]
Signed-off-by: Ingo Molnar

Jeremy Fitzhardinge
2009-09-04 13:10:31 +0800

01 Jul, 2009

1 commit

b01e8dc34 alpha: fix percpu build breakage ... Browse Code »

alpha percpu access requires custom SHIFT_PERCPU_PTR() definition for
modules to work around addressing range limitation. This is done via
generating inline assembly using C preprocessing which forces the
assembler to generate external reference. This happens behind the
compiler's back and makes the compiler think that static percpu variables
in modules are unused.

This used to be worked around by using __unused attribute for percpu
variables which prevent the compiler from omitting the variable; however,
recent declare/definition attribute unification change broke this as
__used can't be used for declaration. Also, in the process,
PER_CPU_ATTRIBUTES definition in alpha percpu.h got broken.

This patch adds PER_CPU_DEF_ATTRIBUTES which is only used for definitions
and make alpha use it to add __used for percpu variables in modules. This
also fixes the PER_CPU_ATTRIBUTES double definition bug.

Signed-off-by: Tejun Heo
Tested-by: maximilian attems
Acked-by: Ivan Kokshaysky
Cc: Richard Henderson
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2009-07-01 09:55:59 +0800

22 Apr, 2009

2 commits

5028eaa97 PERCPU: Collect the DECLARE/DEFINE declarations together ... Browse Code »

Collect the DECLARE/DEFINE declarations together in linux/percpu-defs.h so
that they're in one place, and give them descriptive comments, particularly
the SHARED_ALIGNED variant.

It would be nice to collect these in linux/percpu.h, but that's not possible
without sorting out the severe #include recursion between the x86 arch headers
and the general headers (and possibly other arches too).

Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2009-04-22 10:40:00 +0800
9b8de7479 FRV: Fix the section attribute on UP DECLARE_PER_CPU() ... Browse Code »

In non-SMP mode, the variable section attribute specified by DECLARE_PER_CPU()
does not agree with that specified by DEFINE_PER_CPU(). This means that
architectures that have a small data section references relative to a base
register may throw up linkage errors due to too great a displacement between
where the base register points and the per-CPU variable.

On FRV, the .h declaration says that the variable is in the .sdata section, but
the .c definition says it's actually in the .data section. The linker throws
up the following errors:

kernel/built-in.o: In function `release_task':
kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o
kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o

To fix this, DECLARE_PER_CPU() should simply apply the same section attribute
as does DEFINE_PER_CPU(). However, this is made slightly more complex by
virtue of the fact that there are several variants on DEFINE, so these need to
be matched by variants on DECLARE.

Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2009-04-22 10:39:59 +0800

11 Apr, 2009

1 commit

066123a53 percpu: unbreak alpha percpu ... Browse Code »

For the time being, move the generic percpu_*() accessors to
linux/percpu.h.

asm-generic/percpu.h is meant to carry generic stuff for low level
stuff - declarations, definitions and pointer offset calculation
and so on but not for generic interface.

Signed-off-by: Ingo Molnar

Tejun Heo
2009-04-11 03:36:18 +0800

16 Jan, 2009

1 commit

6dbde3530 percpu: add optimized generic percpu accessors ... Browse Code »

It is an optimization and a cleanup, and adds the following new
generic percpu methods:

percpu_read()
percpu_write()
percpu_add()
percpu_sub()
percpu_and()
percpu_or()
percpu_xor()

and implements support for them on x86. (other architectures will fall
back to a default implementation)

The advantage is that for example to read a local percpu variable,
instead of this sequence:

return __get_cpu_var(var);

ffffffff8102ca2b: 48 8b 14 fd 80 09 74 mov -0x7e8bf680(,%rdi,8),%rdx
ffffffff8102ca32: 81
ffffffff8102ca33: 48 c7 c0 d8 59 00 00 mov $0x59d8,%rax
ffffffff8102ca3a: 48 8b 04 10 mov (%rax,%rdx,1),%rax

We can get a single instruction by using the optimized variants:

return percpu_read(var);

ffffffff8102ca3f: 65 48 8b 05 91 8f fd mov %gs:0x7efd8f91(%rip),%rax

I also cleaned up the x86-specific APIs and made the x86 code use
these new generic percpu primitives.

tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out
* added percpu_and() for completeness's sake
* made generic percpu ops atomic against preemption

Signed-off-by: Ingo Molnar
Signed-off-by: Tejun Heo

Ingo Molnar
2009-01-16 21:20:31 +0800

24 Feb, 2008

1 commit

1e8352784 percpu: fix DEBUG_PREEMPT per_cpu checking ... Browse Code »

2.6.25-rc1 percpu changes broke CONFIG_DEBUG_PREEMPT's per_cpu checking
on several architectures. On s390, sparc64 and x86 it's been weakened to
not checking at all; whereas on powerpc64 it's become too strict, issuing
warnings from __raw_get_cpu_var in io_schedule and init_timer for example.

Fix this by weakening powerpc's __my_cpu_offset to use the non-checking
local_paca instead of get_paca (which itself contains such a check);
and strengthening the generic my_cpu_offset to go the old slow way via
smp_processor_id when CONFIG_DEBUG_PREEMPT (debug_smp_processor_id is
where all the knowledge of what's correct when lives).

Signed-off-by: Hugh Dickins
Reviewed-by: Mike Travis
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-02-24 04:09:28 +0800

30 Jan, 2008

4 commits

dd5af90a7 x86/non-x86: percpu, node ids, apic ids x86.git fixup ... Browse Code »

Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner

Mike Travis
2008-01-30 20:33:32 +0800
acdac8720 percpu: make the asm-generic/percpu.h more "generic" ... Browse Code »

- add support for PER_CPU_ATTRIBUTES

- fix generic smp percpu_modcopy to use per_cpu_offset() macro.

Add the ability to use generic/percpu even if the arch needs to override
several aspects of its operations. This will enable the use of generic
percpu.h for all arches.

An arch may define:

__per_cpu_offset Do not use the generic pointer array. Arch must
define per_cpu_offset(cpu) (used by x86_64, s390).

__my_cpu_offset Can be defined to provide an optimized way to determine
the offset for variables of the currently executing
processor. Used by ia64, x86_64, x86_32, sparc64, s/390.

SHIFT_PTR(ptr, offset) If an arch defines it then special handling
of pointer arithmentic may be implemented. Used
by s/390.

(Some of these special percpu arch implementations may be later consolidated
so that there are less cases to deal with.)

Cc: Rusty Russell
Cc: Andi Kleen
Signed-off-by: Christoph Lameter
Signed-off-by: Mike Travis
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

travis@sgi.com
2008-01-30 20:32:52 +0800
5280e004f percpu: move arch XX_PER_CPU_XX definitions into linux/percpu.h ... Browse Code »

- Special consideration for IA64: Add the ability to specify
arch specific per cpu flags

- remove .data.percpu attribute from DEFINE_PER_CPU for non-smp case.

The arch definitions are all the same. So move them into linux/percpu.h.

We cannot move DECLARE_PER_CPU since some include files just include
asm/percpu.h to avoid include recursion problems.

Cc: Rusty Russell
Cc: Andi Kleen
Signed-off-by: Christoph Lameter
Signed-off-by: Mike Travis
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

travis@sgi.com
2008-01-30 20:32:52 +0800
b32ef636a percpu: use a kconfig variable to signal arch specific percpu setup ... Browse Code »

The use of the __GENERIC_PERCPU is a bit problematic since arches
may want to run their own percpu setup while using the generic
percpu definitions. Replace it through a kconfig variable.

Cc: Rusty Russell
Cc: Andi Kleen
Signed-off-by: Christoph Lameter
Signed-off-by: Mike Travis
Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar

travis@sgi.com
2008-01-30 20:32:51 +0800

20 Jul, 2007

1 commit

5fb7dc37d define new percpu interface for shared data ... Browse Code »

per cpu data section contains two types of data. One set which is
exclusively accessed by the local cpu and the other set which is per cpu,
but also shared by remote cpus. In the current kernel, these two sets are
not clearely separated out. This can potentially cause the same data
cacheline shared between the two sets of data, which will result in
unnecessary bouncing of the cacheline between cpus.

One way to fix the problem is to cacheline align the remotely accessed per
cpu data, both at the beginning and at the end. Because of the padding at
both ends, this will likely cause some memory wastage and also the
interface to achieve this is not clean.

This patch:

Moves the remotely accessed per cpu data (which is currently marked
as ____cacheline_aligned_in_smp) into a different section, where all the data
elements are cacheline aligned. And as such, this differentiates the local
only data and remotely accessed data cleanly.

Signed-off-by: Fenghua Yu
Acked-by: Suresh Siddha
Cc: Rusty Russell
Cc: Christoph Lameter
Cc:
Cc: "Luck, Tony"
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fenghua Yu
2007-07-20 01:04:44 +0800

03 May, 2007

1 commit

ae1ee11be [PATCH] i386: Use per-cpu variables for GDT, PDA ... Browse Code »

Allocating PDA and GDT at boot is a pain. Using simple per-cpu variables adds
happiness (although we need the GDT page-aligned for Xen, which we do in a
followup patch).

[akpm@linux-foundation.org: build fix]
Signed-off-by: Rusty Russell
Signed-off-by: Andi Kleen
Cc: Andi Kleen
Signed-off-by: Andrew Morton

Rusty Russell
2007-05-03 01:27:10 +0800

06 Oct, 2006

1 commit

a666ecfbf [PATCH] Fix typo in "syntax error if percpu macros are incorrectly used" patch ... Browse Code »

Trivial typo fix in the "syntax error if percpu macros are incorrectly
used" patch. I misspelled "identifier" in all places. D'Oh!

Thanks to Dirk Mueller to point this out.

Signed-off-by: Jan Blunck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Blunck
2006-10-06 23:53:41 +0800

26 Sep, 2006

1 commit

632bbfeee [PATCH] trigger a syntax error if percpu macros are incorrectly used ... Browse Code »

get_cpu_var()/per_cpu()/__get_cpu_var() arguments must be simple
identifiers. Otherwise the arch dependent implementations might break.

This patch enforces the correct usage of the macros by producing a syntax
error if the variable is not a simple identifier.

Signed-off-by: Jan Blunck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Blunck
2006-09-26 23:48:44 +0800

04 Jul, 2006

1 commit

a875a69f8 [PATCH] lockdep: add per_cpu_offset() ... Browse Code »

Add the per_cpu_offset() generic method. (used by the lock validator)

Signed-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2006-07-04 06:27:00 +0800

26 Jun, 2006

1 commit

bfe5d8341 [PATCH] Define __raw_get_cpu_var and use it ... Browse Code »

There are several instances of per_cpu(foo, raw_smp_processor_id()), which
is semantically equivalent to __get_cpu_var(foo) but without the warning
that smp_processor_id() can give if CONFIG_DEBUG_PREEMPT is enabled. For
those architectures with optimized per-cpu implementations, namely ia64,
powerpc, s390, sparc64 and x86_64, per_cpu() turns into more and slower
code than __get_cpu_var(), so it would be preferable to use __get_cpu_var
on those platforms.

This defines a __raw_get_cpu_var(x) macro which turns into per_cpu(x,
raw_smp_processor_id()) on architectures that use the generic per-cpu
implementation, and turns into __get_cpu_var(x) on the architectures that
have an optimized per-cpu implementation.

Signed-off-by: Paul Mackerras
Acked-by: David S. Miller
Acked-by: Ingo Molnar
Acked-by: Martin Schwidefsky
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Mackerras
2006-06-26 01:01:01 +0800

29 Mar, 2006

1 commit

0a9450227 [PATCH] for_each_possible_cpu: fixes for generic part ... Browse Code »

replaces for_each_cpu with for_each_possible_cpu().

Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2006-03-29 01:16:05 +0800

23 Mar, 2006

1 commit

394e3902c [PATCH] more for_each_cpu() conversions ... Browse Code »

When we stop allocating percpu memory for not-possible CPUs we must not touch
the percpu data for not-possible CPUs at all. The correct way of doing this
is to test cpu_possible() or to use for_each_cpu().

This patch is a kernel-wide sweep of all instances of NR_CPUS. I found very
few instances of this bug, if any. But the patch converts lots of open-coded
test to use the preferred helper macros.

Cc: Mikael Starvik
Cc: David Howells
Acked-by: Kyle McMartin
Cc: Anton Blanchard
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Paul Mundt
Cc: "David S. Miller"
Cc: William Lee Irwin III
Cc: Andi Kleen
Cc: Christian Zankel
Cc: Philippe Elie
Cc: Nathan Scott
Cc: Jens Axboe
Cc: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-03-23 23:38:17 +0800

24 Jun, 2005

1 commit

11c80c836 [PATCH] adjust per_cpu definition in non-SMP case ... Browse Code »

Fix (in the architectures I'm actually building for) the UP definition of
per_cpu so that the cpu specified may be any expression, not just an
identifier or a suffix expression.

Signed-off-by: Jan Beulich
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Beulich
2005-06-24 00:45:28 +0800

17 Apr, 2005

1 commit

1da177e4c Linux-2.6.12-rc2 ... Browse Code »

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

Linus Torvalds
2005-04-17 06:20:36 +0800