10 Sep, 2010

2 commits

  • Redefine __get_cpu_var() using this_cpu_ptr() which can be
    arch-optimized.

    Signed-off-by: Brian Gerst
    Signed-off-by: Tejun Heo

    Brian Gerst
     
  • Allow arches to implement __this_cpu_ptr, and provide an x86 version.

    Before:
    movq $foo, %rax
    movq %gs:this_cpu_off, %rdx
    addq %rdx, %rax

    After:
    movq $foo, %rax
    addq %gs:this_cpu_off, %rax

    The benefit is doing it in one less instruction and not clobbering
    a temporary register.

    tj: * Beefed up the comment a bit and renamed in-macro temp variable
    to match neighboring macros.

    * Folded fix for const pointer case found in linux-next.

    * Fixed sparse notation.

    Signed-off-by: Brian Gerst
    Signed-off-by: Tejun Heo

    Brian Gerst
     

07 Aug, 2010

1 commit


01 Jun, 2010

1 commit

  • * 'for-35' of git://repo.or.cz/linux-kbuild: (81 commits)
    kbuild: Revert part of e8d400a to resolve a conflict
    kbuild: Fix checking of scm-identifier variable
    gconfig: add support to show hidden options that have prompts
    menuconfig: add support to show hidden options which have prompts
    gconfig: remove show_debug option
    gconfig: remove dbg_print_ptype() and dbg_print_stype()
    kconfig: fix zconfdump()
    kconfig: some small fixes
    add random binaries to .gitignore
    kbuild: Include gen_initramfs_list.sh and the file list in the .d file
    kconfig: recalc symbol value before showing search results
    .gitignore: ignore *.lzo files
    headerdep: perlcritic warning
    scripts/Makefile.lib: Align the output of LZO
    kbuild: Generate modules.builtin in make modules_install
    Revert "kbuild: specify absolute paths for cscope"
    kbuild: Do not unnecessarily regenerate modules.builtin
    headers_install: use local file handles
    headers_check: fix perl warnings
    export_report: fix perl warnings
    ...

    Linus Torvalds
     

03 Mar, 2010

1 commit


29 Oct, 2009

3 commits

  • The previous patch made sparse warn about percpu variables being used
    directly without going through percpu accessors. This patch
    implements the other half - checking whether non percpu variable is
    passed into percpu accessors.

    Signed-off-by: Tejun Heo
    Cc: Rusty Russell
    Cc: Al Viro

    Tejun Heo
     
  • We have to make __kernel "__attribute__((address_space(0)))" so we can
    cast to it.

    tj: * put_cpu_var() update.

    * Annotations added to dynamic allocator interface.

    Signed-off-by: Rusty Russell
    Cc: Al Viro
    Signed-off-by: Tejun Heo

    Rusty Russell
     
  • Now that the return from alloc_percpu is compatible with the address
    of per-cpu vars, it makes sense to hand around the address of per-cpu
    variables. To make this sane, we remove the per_cpu__ prefix we used
    created to stop people accidentally using these vars directly.

    Now we have sparse, we can use that (next patch).

    tj: * Updated to convert stuff which were missed by or added after the
    original patch.

    * Kill per_cpu_var() macro.

    Signed-off-by: Rusty Russell
    Signed-off-by: Tejun Heo
    Reviewed-by: Christoph Lameter

    Rusty Russell
     

03 Oct, 2009

1 commit

  • This patch introduces two things: First this_cpu_ptr and then per cpu
    atomic operations.

    this_cpu_ptr
    ------------

    A common operation when dealing with cpu data is to get the instance of the
    cpu data associated with the currently executing processor. This can be
    optimized by

    this_cpu_ptr(xx) = per_cpu_ptr(xx, smp_processor_id).

    The problem with per_cpu_ptr(x, smp_processor_id) is that it requires
    an array lookup to find the offset for the cpu. Processors typically
    have the offset for the current cpu area in some kind of (arch dependent)
    efficiently accessible register or memory location.

    We can use that instead of doing the array lookup to speed up the
    determination of the address of the percpu variable. This is particularly
    significant because these lookups occur in performance critical paths
    of the core kernel. this_cpu_ptr() can avoid memory accesses and

    this_cpu_ptr comes in two flavors. The preemption context matters since we
    are referring the the currently executing processor. In many cases we must
    insure that the processor does not change while a code segment is executed.

    __this_cpu_ptr -> Do not check for preemption context
    this_cpu_ptr -> Check preemption context

    The parameter to these operations is a per cpu pointer. This can be the
    address of a statically defined per cpu variable (&per_cpu_var(xxx)) or
    the address of a per cpu variable allocated with the per cpu allocator.

    per cpu atomic operations: this_cpu_*(var, val)
    -----------------------------------------------
    this_cpu_* operations (like this_cpu_add(struct->y, value) operate on
    abitrary scalars that are members of structures allocated with the new
    per cpu allocator. They can also operate on static per_cpu variables
    if they are passed to per_cpu_var() (See patch to use this_cpu_*
    operations for vm statistics).

    These operations are guaranteed to be atomic vs preemption when modifying
    the scalar. The calculation of the per cpu offset is also guaranteed to
    be atomic at the same time. This means that a this_cpu_* operation can be
    safely used to modify a per cpu variable in a context where interrupts are
    enabled and preemption is allowed. Many architectures can perform such
    a per cpu atomic operation with a single instruction.

    Note that the atomicity here is different from regular atomic operations.
    Atomicity is only guaranteed for data accessed from the currently executing
    processor. Modifications from other processors are still possible. There
    must be other guarantees that the per cpu data is not modified from another
    processor when using these instruction. The per cpu atomicity is created
    by the fact that the processor either executes and instruction or not.
    Embedded in the instruction is the relocation of the per cpu address to
    the are reserved for the current processor and the RMW action. Therefore
    interrupts or preemption cannot occur in the mids of this processing.

    Generic fallback functions are used if an arch does not define optimized
    this_cpu operations. The functions come also come in the two flavors used
    for this_cpu_ptr().

    The firstparameter is a scalar that is a member of a structure allocated
    through allocpercpu or a per cpu variable (use per_cpu_var(xxx)). The
    operations are similar to what percpu_add() and friends do.

    this_cpu_read(scalar)
    this_cpu_write(scalar, value)
    this_cpu_add(scale, value)
    this_cpu_sub(scalar, value)
    this_cpu_inc(scalar)
    this_cpu_dec(scalar)
    this_cpu_and(scalar, value)
    this_cpu_or(scalar, value)
    this_cpu_xor(scalar, value)

    Arch code can override the generic functions and provide optimized atomic
    per cpu operations. These atomic operations must provide both the relocation
    (x86 does it through a segment override) and the operation on the data in a
    single instruction. Otherwise preempt needs to be disabled and there is no
    gain from providing arch implementations.

    A third variant is provided prefixed by irqsafe_. These variants are safe
    against hardware interrupts on the *same* processor (all per cpu atomic
    primitives are *always* *only* providing safety for code running on the
    *same* processor!). The increment needs to be implemented by the hardware
    in such a way that it is a single RMW instruction that is either processed
    before or after an interrupt.

    cc: David Howells
    cc: Ingo Molnar
    cc: Rusty Russell
    cc: Eric Dumazet
    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     

04 Sep, 2009

1 commit


01 Jul, 2009

1 commit

  • alpha percpu access requires custom SHIFT_PERCPU_PTR() definition for
    modules to work around addressing range limitation. This is done via
    generating inline assembly using C preprocessing which forces the
    assembler to generate external reference. This happens behind the
    compiler's back and makes the compiler think that static percpu variables
    in modules are unused.

    This used to be worked around by using __unused attribute for percpu
    variables which prevent the compiler from omitting the variable; however,
    recent declare/definition attribute unification change broke this as
    __used can't be used for declaration. Also, in the process,
    PER_CPU_ATTRIBUTES definition in alpha percpu.h got broken.

    This patch adds PER_CPU_DEF_ATTRIBUTES which is only used for definitions
    and make alpha use it to add __used for percpu variables in modules. This
    also fixes the PER_CPU_ATTRIBUTES double definition bug.

    Signed-off-by: Tejun Heo
    Tested-by: maximilian attems
    Acked-by: Ivan Kokshaysky
    Cc: Richard Henderson
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

22 Apr, 2009

2 commits

  • Collect the DECLARE/DEFINE declarations together in linux/percpu-defs.h so
    that they're in one place, and give them descriptive comments, particularly
    the SHARED_ALIGNED variant.

    It would be nice to collect these in linux/percpu.h, but that's not possible
    without sorting out the severe #include recursion between the x86 arch headers
    and the general headers (and possibly other arches too).

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     
  • In non-SMP mode, the variable section attribute specified by DECLARE_PER_CPU()
    does not agree with that specified by DEFINE_PER_CPU(). This means that
    architectures that have a small data section references relative to a base
    register may throw up linkage errors due to too great a displacement between
    where the base register points and the per-CPU variable.

    On FRV, the .h declaration says that the variable is in the .sdata section, but
    the .c definition says it's actually in the .data section. The linker throws
    up the following errors:

    kernel/built-in.o: In function `release_task':
    kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o
    kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o

    To fix this, DECLARE_PER_CPU() should simply apply the same section attribute
    as does DEFINE_PER_CPU(). However, this is made slightly more complex by
    virtue of the fact that there are several variants on DEFINE, so these need to
    be matched by variants on DECLARE.

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     

11 Apr, 2009

1 commit

  • For the time being, move the generic percpu_*() accessors to
    linux/percpu.h.

    asm-generic/percpu.h is meant to carry generic stuff for low level
    stuff - declarations, definitions and pointer offset calculation
    and so on but not for generic interface.

    Signed-off-by: Ingo Molnar

    Tejun Heo
     

16 Jan, 2009

1 commit

  • It is an optimization and a cleanup, and adds the following new
    generic percpu methods:

    percpu_read()
    percpu_write()
    percpu_add()
    percpu_sub()
    percpu_and()
    percpu_or()
    percpu_xor()

    and implements support for them on x86. (other architectures will fall
    back to a default implementation)

    The advantage is that for example to read a local percpu variable,
    instead of this sequence:

    return __get_cpu_var(var);

    ffffffff8102ca2b: 48 8b 14 fd 80 09 74 mov -0x7e8bf680(,%rdi,8),%rdx
    ffffffff8102ca32: 81
    ffffffff8102ca33: 48 c7 c0 d8 59 00 00 mov $0x59d8,%rax
    ffffffff8102ca3a: 48 8b 04 10 mov (%rax,%rdx,1),%rax

    We can get a single instruction by using the optimized variants:

    return percpu_read(var);

    ffffffff8102ca3f: 65 48 8b 05 91 8f fd mov %gs:0x7efd8f91(%rip),%rax

    I also cleaned up the x86-specific APIs and made the x86 code use
    these new generic percpu primitives.

    tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out
    * added percpu_and() for completeness's sake
    * made generic percpu ops atomic against preemption

    Signed-off-by: Ingo Molnar
    Signed-off-by: Tejun Heo

    Ingo Molnar
     

24 Feb, 2008

1 commit

  • 2.6.25-rc1 percpu changes broke CONFIG_DEBUG_PREEMPT's per_cpu checking
    on several architectures. On s390, sparc64 and x86 it's been weakened to
    not checking at all; whereas on powerpc64 it's become too strict, issuing
    warnings from __raw_get_cpu_var in io_schedule and init_timer for example.

    Fix this by weakening powerpc's __my_cpu_offset to use the non-checking
    local_paca instead of get_paca (which itself contains such a check);
    and strengthening the generic my_cpu_offset to go the old slow way via
    smp_processor_id when CONFIG_DEBUG_PREEMPT (debug_smp_processor_id is
    where all the knowledge of what's correct when lives).

    Signed-off-by: Hugh Dickins
    Reviewed-by: Mike Travis
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

30 Jan, 2008

4 commits

  • Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Mike Travis
     
  • - add support for PER_CPU_ATTRIBUTES

    - fix generic smp percpu_modcopy to use per_cpu_offset() macro.

    Add the ability to use generic/percpu even if the arch needs to override
    several aspects of its operations. This will enable the use of generic
    percpu.h for all arches.

    An arch may define:

    __per_cpu_offset Do not use the generic pointer array. Arch must
    define per_cpu_offset(cpu) (used by x86_64, s390).

    __my_cpu_offset Can be defined to provide an optimized way to determine
    the offset for variables of the currently executing
    processor. Used by ia64, x86_64, x86_32, sparc64, s/390.

    SHIFT_PTR(ptr, offset) If an arch defines it then special handling
    of pointer arithmentic may be implemented. Used
    by s/390.

    (Some of these special percpu arch implementations may be later consolidated
    so that there are less cases to deal with.)

    Cc: Rusty Russell
    Cc: Andi Kleen
    Signed-off-by: Christoph Lameter
    Signed-off-by: Mike Travis
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    travis@sgi.com
     
  • - Special consideration for IA64: Add the ability to specify
    arch specific per cpu flags

    - remove .data.percpu attribute from DEFINE_PER_CPU for non-smp case.

    The arch definitions are all the same. So move them into linux/percpu.h.

    We cannot move DECLARE_PER_CPU since some include files just include
    asm/percpu.h to avoid include recursion problems.

    Cc: Rusty Russell
    Cc: Andi Kleen
    Signed-off-by: Christoph Lameter
    Signed-off-by: Mike Travis
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    travis@sgi.com
     
  • The use of the __GENERIC_PERCPU is a bit problematic since arches
    may want to run their own percpu setup while using the generic
    percpu definitions. Replace it through a kconfig variable.

    Cc: Rusty Russell
    Cc: Andi Kleen
    Signed-off-by: Christoph Lameter
    Signed-off-by: Mike Travis
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    travis@sgi.com
     

20 Jul, 2007

1 commit

  • per cpu data section contains two types of data. One set which is
    exclusively accessed by the local cpu and the other set which is per cpu,
    but also shared by remote cpus. In the current kernel, these two sets are
    not clearely separated out. This can potentially cause the same data
    cacheline shared between the two sets of data, which will result in
    unnecessary bouncing of the cacheline between cpus.

    One way to fix the problem is to cacheline align the remotely accessed per
    cpu data, both at the beginning and at the end. Because of the padding at
    both ends, this will likely cause some memory wastage and also the
    interface to achieve this is not clean.

    This patch:

    Moves the remotely accessed per cpu data (which is currently marked
    as ____cacheline_aligned_in_smp) into a different section, where all the data
    elements are cacheline aligned. And as such, this differentiates the local
    only data and remotely accessed data cleanly.

    Signed-off-by: Fenghua Yu
    Acked-by: Suresh Siddha
    Cc: Rusty Russell
    Cc: Christoph Lameter
    Cc:
    Cc: "Luck, Tony"
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fenghua Yu
     

03 May, 2007

1 commit

  • Allocating PDA and GDT at boot is a pain. Using simple per-cpu variables adds
    happiness (although we need the GDT page-aligned for Xen, which we do in a
    followup patch).

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Rusty Russell
    Signed-off-by: Andi Kleen
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton

    Rusty Russell
     

06 Oct, 2006

1 commit


26 Sep, 2006

1 commit


04 Jul, 2006

1 commit


26 Jun, 2006

1 commit

  • There are several instances of per_cpu(foo, raw_smp_processor_id()), which
    is semantically equivalent to __get_cpu_var(foo) but without the warning
    that smp_processor_id() can give if CONFIG_DEBUG_PREEMPT is enabled. For
    those architectures with optimized per-cpu implementations, namely ia64,
    powerpc, s390, sparc64 and x86_64, per_cpu() turns into more and slower
    code than __get_cpu_var(), so it would be preferable to use __get_cpu_var
    on those platforms.

    This defines a __raw_get_cpu_var(x) macro which turns into per_cpu(x,
    raw_smp_processor_id()) on architectures that use the generic per-cpu
    implementation, and turns into __get_cpu_var(x) on the architectures that
    have an optimized per-cpu implementation.

    Signed-off-by: Paul Mackerras
    Acked-by: David S. Miller
    Acked-by: Ingo Molnar
    Acked-by: Martin Schwidefsky
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Mackerras
     

29 Mar, 2006

1 commit


23 Mar, 2006

1 commit

  • When we stop allocating percpu memory for not-possible CPUs we must not touch
    the percpu data for not-possible CPUs at all. The correct way of doing this
    is to test cpu_possible() or to use for_each_cpu().

    This patch is a kernel-wide sweep of all instances of NR_CPUS. I found very
    few instances of this bug, if any. But the patch converts lots of open-coded
    test to use the preferred helper macros.

    Cc: Mikael Starvik
    Cc: David Howells
    Acked-by: Kyle McMartin
    Cc: Anton Blanchard
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Cc: "David S. Miller"
    Cc: William Lee Irwin III
    Cc: Andi Kleen
    Cc: Christian Zankel
    Cc: Philippe Elie
    Cc: Nathan Scott
    Cc: Jens Axboe
    Cc: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

24 Jun, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds