23 Dec, 2011

1 commit

  • We simply say that regular this_cpu use must be safe regardless of
    preemption and interrupt state. That has no material change for x86
    and s390 implementations of this_cpu operations. However, arches that
    do not provide their own implementation for this_cpu operations will
    now get code generated that disables interrupts instead of preemption.

    -tj: This is part of on-going percpu API cleanup. For detailed
    discussion of the subject, please refer to the following thread.

    http://thread.gmane.org/gmane.linux.kernel/1222078

    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo
    LKML-Reference:

    Christoph Lameter
     

04 Jun, 2011

1 commit

  • On an architecture without CMPXCHG_LOCAL but with DEBUG_VM enabled,
    the VM_BUG_ON() in __pcpu_double_call_return_bool() will cause an early
    panic during boot unless we always align cpu_slab properly.

    In principle we could remove the alignment-testing VM_BUG_ON() for
    architectures that don't have CMPXCHG_LOCAL, but leaving it in means
    that new code will tend not to break x86 even if it is introduced
    on another platform, and it's low cost to require alignment.

    Acked-by: David Rientjes
    Acked-by: Christoph Lameter
    Signed-off-by: Chris Metcalf
    Signed-off-by: Pekka Enberg

    Chris Metcalf
     

05 May, 2011

1 commit

  • The SLUB allocator use of the cmpxchg_double logic was wrong: it
    actually needs the irq-safe one.

    That happens automatically when we use the native unlocked 'cmpxchg8b'
    instruction, but when compiling the kernel for older x86 CPUs that do
    not support that instruction, we fall back to the generic emulation
    code.

    And if you don't specify that you want the irq-safe version, the generic
    code ends up just open-coding the cmpxchg8b equivalent without any
    protection against interrupts or preemption. Which definitely doesn't
    work for SLUB.

    This was reported by Werner Landgraf , who saw
    instability with his distro-kernel that was compiled to support pretty
    much everything under the sun. Most big Linux distributions tend to
    compile for PPro and later, and would never have noticed this problem.

    This also fixes the prototypes for the irqsafe cmpxchg_double functions
    to use 'bool' like they should.

    [ Btw, that whole "generic code defaults to no protection" design just
    sounds stupid - if the code needs no protection, there is no reason to
    use "cmpxchg_double" to begin with. So we should probably just remove
    the unprotected version entirely as pointless. - Linus ]

    Signed-off-by: Thomas Gleixner
    Reported-and-tested-by: werner
    Acked-and-tested-by: Ingo Molnar
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Jens Axboe
    Cc: Tejun Heo
    Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1105041539050.3005@ionos
    Signed-off-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     

28 Feb, 2011

1 commit

  • Introduce this_cpu_cmpxchg_double(). this_cpu_cmpxchg_double() allows
    the comparison between two consecutive words and replaces them if
    there is a match.

    bool this_cpu_cmpxchg_double(pcp1, pcp2,
    old_word1, old_word2, new_word1, new_word2)

    this_cpu_cmpxchg_double does not return the old value (difficult since
    there are two words) but a boolean indicating if the operation was
    successful.

    The first percpu variable must be double word aligned!

    -tj: Updated to return bool instead of int, converted size check to
    BUILD_BUG_ON() instead of VM_BUG_ON() and other cosmetic changes.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     

18 Dec, 2010

1 commit

  • Generic code to provide new per cpu atomic features

    this_cpu_cmpxchg
    this_cpu_xchg

    Fallback occurs to functions using interrupts disable/enable
    to ensure correct per cpu atomicity.

    Fallback to regular cmpxchg and xchg is not possible since per cpu atomic
    semantics include the guarantee that the current cpus per cpu data is
    accessed atomically. Use of regular cmpxchg and xchg requires the
    determination of the address of the per cpu data before regular cmpxchg
    or xchg which therefore cannot be atomically included in an xchg or
    cmpxchg without segment override.

    tj: - Relocated new ops to conform better to the general organization.
    - This patch contains a trivial comment fix.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     

17 Dec, 2010

2 commits

  • - include/linux/percpu.h: this_cpu_add_return() and friends were
    located next to __this_cpu_add_return(). However, the overall
    organization is to first group by preemption safeness. Relocate
    this_cpu_add_return() and friends to preemption-safe area.

    - arch/x86/include/asm/percpu.h: Relocate percpu_add_return_op() after
    other more basic operations. Relocate [__]this_cpu_add_return_8()
    so that they're first grouped by preemption safeness.

    Signed-off-by: Tejun Heo
    Cc: Christoph Lameter

    Tejun Heo
     
  • Introduce generic support for this_cpu_add_return etc.

    The fallback is to realize these operations with simpler __this_cpu_ops.

    tj: - Reformatted __cpu_size_call_return2() to make it more consistent
    with its neighbors.
    - Dropped unnecessary temp variable ret__ from
    __this_cpu_generic_add_return().

    Reviewed-by: Tejun Heo
    Reviewed-by: Mathieu Desnoyers
    Acked-by: H. Peter Anvin
    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     

23 Oct, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
    percpu: update comments to reflect that percpu allocations are always zero-filled
    percpu: Optimize __get_cpu_var()
    x86, percpu: Optimize this_cpu_ptr
    percpu: clear memory allocated with the km allocator
    percpu: fix build breakage on s390 and cleanup build configuration tests
    percpu: use percpu allocator on UP too
    percpu: reduce PCPU_MIN_UNIT_SIZE to 32k
    vmalloc: pcpu_get/free_vm_areas() aren't needed on UP

    Fixed up trivial conflicts in include/linux/percpu.h

    Linus Torvalds
     

21 Sep, 2010

1 commit


08 Sep, 2010

2 commits

  • On UP, percpu allocations were redirected to kmalloc. This has the
    following problems.

    * For certain amount of allocations (determined by
    PERCPU_DYNAMIC_EARLY_SLOTS and PERCPU_DYNAMIC_EARLY_SIZE), percpu
    allocator can be used before the usual kernel memory allocator is
    brought online. On SMP, this is used to initialize the kernel
    memory allocator.

    * percpu allocator honors alignment upto PAGE_SIZE but kmalloc()
    doesn't. For example, workqueue makes use of larger alignments for
    cpu_workqueues.

    Currently, users of percpu allocators need to handle UP differently,
    which is somewhat fragile and ugly. Other than small amount of
    memory, there isn't much to lose by enabling percpu allocator on UP.
    It can simply use kernel memory based chunk allocation which was added
    for SMP archs w/o MMUs.

    This patch removes mm/percpu_up.c, builds mm/percpu.c on UP too and
    makes UP build use percpu-km. As percpu addresses and kernel
    addresses are always identity mapped and static percpu variables don't
    need any special treatment, nothing is arch dependent and mm/percpu.c
    implements generic setup_per_cpu_areas() for UP.

    Signed-off-by: Tejun Heo
    Reviewed-by: Christoph Lameter
    Acked-by: Pekka Enberg

    Tejun Heo
     
  • In preparation of enabling percpu allocator for UP, reduce
    PCPU_MIN_UNIT_SIZE to 32k. On UP, the first chunk doesn't have to
    include static percpu variables and chunk size can be smaller which is
    important as UP percpu allocator will use contiguous kernel memory to
    populate chunks.

    PCPU_MIN_UNIT_SIZE also determines the maximum supported allocation
    size but 32k should still be enough.

    Signed-off-by: Tejun Heo
    Reviewed-by: Christoph Lameter

    Tejun Heo
     

07 Aug, 2010

1 commit


28 Jun, 2010

2 commits

  • This patch updates percpu allocator such that it can serve limited
    amount of allocation before slab comes online. This is primarily to
    allow slab to depend on working percpu allocator.

    Two parameters, PERCPU_DYNAMIC_EARLY_SIZE and SLOTS, determine how
    much memory space and allocation map slots are reserved. If this
    reserved area is exhausted, WARN_ON_ONCE() will trigger and allocation
    will fail till slab comes online.

    The following changes are made to implement early alloc.

    * pcpu_mem_alloc() now checks slab_is_available()

    * Chunks are allocated using pcpu_mem_alloc()

    * Init paths make sure ai->dyn_size is at least as large as
    PERCPU_DYNAMIC_EARLY_SIZE.

    * Initial alloc maps are allocated in __initdata and copied to
    kmalloc'd areas once slab is online.

    Signed-off-by: Tejun Heo
    Cc: Christoph Lameter

    Tejun Heo
     
  • In pcpu_build_alloc_info() and pcpu_embed_first_chunk(), @dyn_size was
    ssize_t, -1 meant auto-size, 0 forced 0 and positive meant minimum
    size. There's no use case for forcing 0 and the upcoming early alloc
    support always requires non-zero dynamic size. Make @dyn_size always
    mean minimum dyn_size.

    While at it, make pcpu_build_alloc_info() static which doesn't have
    any external caller as suggested by David Rientjes.

    Signed-off-by: Tejun Heo
    Cc: David Rientjes

    Tejun Heo
     

06 Apr, 2010

1 commit

  • * 'slabh' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc:
    eeepc-wmi: include slab.h
    staging/otus: include slab.h from usbdrv.h
    percpu: don't implicitly include slab.h from percpu.h
    kmemcheck: Fix build errors due to missing slab.h
    include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
    iwlwifi: don't include iwl-dev.h from iwl-devtrace.h
    x86: don't include slab.h from arch/x86/include/asm/pgtable_32.h

    Fix up trivial conflicts in include/linux/percpu.h due to
    is_kernel_percpu_address() having been introduced since the slab.h
    cleanup with the percpu_up.c splitup.

    Linus Torvalds
     

30 Mar, 2010

1 commit

  • percpu.h has always been including slab.h to get k[mz]alloc/free() for
    UP inline implementation. percpu.h being used by very low level
    headers including module.h and sched.h, this meant that a lot files
    unintentionally got slab.h inclusion.

    Lee Schermerhorn was trying to make topology.h use percpu.h and got
    bitten by this implicit inclusion. The right thing to do is break
    this ultimately unnecessary dependency. The previous patch added
    explicit inclusion of either gfp.h or slab.h to the source files using
    them. This patch updates percpu.h such that slab.h is no longer
    included from percpu.h.

    Signed-off-by: Tejun Heo
    Reviewed-by: Christoph Lameter
    Cc: Ingo Molnar
    Cc: Lee Schermerhorn

    Tejun Heo
     

29 Mar, 2010

1 commit

  • lockdep has custom code to check whether a pointer belongs to static
    percpu area which is somewhat broken. Implement proper
    is_kernel/module_percpu_address() and replace the custom code.

    On UP, percpu variables are regular static variables and can't be
    distinguished from them. Always return %false on UP.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Ingo Molnar

    Tejun Heo
     

05 Jan, 2010

1 commit


08 Dec, 2009

1 commit


02 Dec, 2009

1 commit


25 Nov, 2009

1 commit

  • o kdump functionality reserves a per cpu area at boot time and exports the
    physical address of that area to user space through sys interface. This
    area stores some dump related information like cpu register states etc
    at the time of crash.

    o We were assuming that per cpu area always come from linearly mapped meory
    region and using __pa() to determine physical address.
    With percpu_alloc=page, per cpu area can come from vmalloc region also and
    __pa() breaks.

    o This patch implments a new function to convert per cpu address to
    physical address.

    Before the patch, crash_notes addresses looked as follows.

    cpu0 60fffff49800
    cpu1 60fffff60800
    cpu2 60fffff77800

    These are bogus phsyical addresses.

    After the patch, address are following.

    cpu0 13eb44000
    cpu1 13eb43000
    cpu2 13eb42000
    cpu3 13eb41000

    These look fine. I got 4G of memory and /proc/iomem tell me following.

    100000000-13fffffff : System RAM

    tj: * added missing asm/io.h include reported by Stephen Rothwell
    * repositioned per_cpu_ptr_phys() in percpu.c and added comment.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Tejun Heo
    Cc: Stephen Rothwell

    Vivek Goyal
     

29 Oct, 2009

6 commits

  • The previous patch made sparse warn about percpu variables being used
    directly without going through percpu accessors. This patch
    implements the other half - checking whether non percpu variable is
    passed into percpu accessors.

    Signed-off-by: Tejun Heo
    Cc: Rusty Russell
    Cc: Al Viro

    Tejun Heo
     
  • We have to make __kernel "__attribute__((address_space(0)))" so we can
    cast to it.

    tj: * put_cpu_var() update.

    * Annotations added to dynamic allocator interface.

    Signed-off-by: Rusty Russell
    Cc: Al Viro
    Signed-off-by: Tejun Heo

    Rusty Russell
     
  • Now that per_cpu__ prefix is gone, there's no distinction between
    static and dynamic percpu variables. Make get_cpu_var() take dynamic
    percpu variables and ensure that all macros have parentheses around
    the parameter evaluation and evaluate the variable parameter only once
    such that any expression which evaluates to percpu address can be used
    safely.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Now that the return from alloc_percpu is compatible with the address
    of per-cpu vars, it makes sense to hand around the address of per-cpu
    variables. To make this sane, we remove the per_cpu__ prefix we used
    created to stop people accidentally using these vars directly.

    Now we have sparse, we can use that (next patch).

    tj: * Updated to convert stuff which were missed by or added after the
    original patch.

    * Kill per_cpu_var() macro.

    Signed-off-by: Rusty Russell
    Signed-off-by: Tejun Heo
    Reviewed-by: Christoph Lameter

    Rusty Russell
     
  • Make the following changes to remove some sparse warnings.

    * Make DEFINE_PER_CPU_SECTION() declare __pcpu_unique_* before
    defining it.

    * Annotate pcpu_extend_area_map() that it is entered with pcpu_lock
    held, releases it and then reacquires it.

    * Make percpu related macros use unique nested variable names.

    * While at it, add pcpu prefix to __size_call[_return]() macros as
    to-be-implemented sparse annotations will add percpu specific stuff
    to these macros.

    Signed-off-by: Tejun Heo
    Reviewed-by: Christoph Lameter
    Cc: Rusty Russell

    Tejun Heo
     
  • alloc_percpu() couldn't handle array types like "int [100]" due to the
    way return type was casted. Fix it by using typeof() instead.

    Signed-off-by: Tejun Heo
    Reviewed-by: Frederic Weisbecker
    Reviewed-by: Christoph Lameter

    Tejun Heo
     

03 Oct, 2009

1 commit

  • This patch introduces two things: First this_cpu_ptr and then per cpu
    atomic operations.

    this_cpu_ptr
    ------------

    A common operation when dealing with cpu data is to get the instance of the
    cpu data associated with the currently executing processor. This can be
    optimized by

    this_cpu_ptr(xx) = per_cpu_ptr(xx, smp_processor_id).

    The problem with per_cpu_ptr(x, smp_processor_id) is that it requires
    an array lookup to find the offset for the cpu. Processors typically
    have the offset for the current cpu area in some kind of (arch dependent)
    efficiently accessible register or memory location.

    We can use that instead of doing the array lookup to speed up the
    determination of the address of the percpu variable. This is particularly
    significant because these lookups occur in performance critical paths
    of the core kernel. this_cpu_ptr() can avoid memory accesses and

    this_cpu_ptr comes in two flavors. The preemption context matters since we
    are referring the the currently executing processor. In many cases we must
    insure that the processor does not change while a code segment is executed.

    __this_cpu_ptr -> Do not check for preemption context
    this_cpu_ptr -> Check preemption context

    The parameter to these operations is a per cpu pointer. This can be the
    address of a statically defined per cpu variable (&per_cpu_var(xxx)) or
    the address of a per cpu variable allocated with the per cpu allocator.

    per cpu atomic operations: this_cpu_*(var, val)
    -----------------------------------------------
    this_cpu_* operations (like this_cpu_add(struct->y, value) operate on
    abitrary scalars that are members of structures allocated with the new
    per cpu allocator. They can also operate on static per_cpu variables
    if they are passed to per_cpu_var() (See patch to use this_cpu_*
    operations for vm statistics).

    These operations are guaranteed to be atomic vs preemption when modifying
    the scalar. The calculation of the per cpu offset is also guaranteed to
    be atomic at the same time. This means that a this_cpu_* operation can be
    safely used to modify a per cpu variable in a context where interrupts are
    enabled and preemption is allowed. Many architectures can perform such
    a per cpu atomic operation with a single instruction.

    Note that the atomicity here is different from regular atomic operations.
    Atomicity is only guaranteed for data accessed from the currently executing
    processor. Modifications from other processors are still possible. There
    must be other guarantees that the per cpu data is not modified from another
    processor when using these instruction. The per cpu atomicity is created
    by the fact that the processor either executes and instruction or not.
    Embedded in the instruction is the relocation of the per cpu address to
    the are reserved for the current processor and the RMW action. Therefore
    interrupts or preemption cannot occur in the mids of this processing.

    Generic fallback functions are used if an arch does not define optimized
    this_cpu operations. The functions come also come in the two flavors used
    for this_cpu_ptr().

    The firstparameter is a scalar that is a member of a structure allocated
    through allocpercpu or a per cpu variable (use per_cpu_var(xxx)). The
    operations are similar to what percpu_add() and friends do.

    this_cpu_read(scalar)
    this_cpu_write(scalar, value)
    this_cpu_add(scale, value)
    this_cpu_sub(scalar, value)
    this_cpu_inc(scalar)
    this_cpu_dec(scalar)
    this_cpu_and(scalar, value)
    this_cpu_or(scalar, value)
    this_cpu_xor(scalar, value)

    Arch code can override the generic functions and provide optimized atomic
    per cpu operations. These atomic operations must provide both the relocation
    (x86 does it through a segment override) and the operation on the data in a
    single instruction. Otherwise preempt needs to be disabled and there is no
    gain from providing arch implementations.

    A third variant is provided prefixed by irqsafe_. These variants are safe
    against hardware interrupts on the *same* processor (all per cpu atomic
    primitives are *always* *only* providing safety for code running on the
    *same* processor!). The increment needs to be implemented by the hardware
    in such a way that it is a single RMW instruction that is either processed
    before or after an interrupt.

    cc: David Howells
    cc: Ingo Molnar
    cc: Rusty Russell
    cc: Eric Dumazet
    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     

02 Oct, 2009

1 commit


14 Aug, 2009

11 commits

  • With x86 converted to embedding allocator, lpage doesn't have any user
    left. Kill it along with cpa handling code.

    Signed-off-by: Tejun Heo
    Cc: Jan Beulich

    Tejun Heo
     
  • Now that percpu core can handle very sparse units, given that vmalloc
    space is large enough, embedding first chunk allocator can use any
    memory to build the first chunk. This patch teaches
    pcpu_embed_first_chunk() about distances between cpus and to use
    alloc/free callbacks to allocate node specific areas for each group
    and use them for the first chunk.

    This brings the benefits of embedding allocator to NUMA configurations
    - no extra TLB pressure with the flexibility of unified dynamic
    allocator and no need to restructure arch code to build memory layout
    suitable for percpu. With units put into atom_size aligned groups
    according to cpu distances, using large page for dynamic chunks is
    also easily possible with falling back to reuglar pages if large
    allocation fails.

    Embedding allocator users are converted to specify NULL
    cpu_distance_fn, so this patch doesn't cause any visible behavior
    difference. Following patches will convert them.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Currently units are mapped sequentially into address space. This
    patch adds pcpu_unit_offsets[] which allows units to be mapped to
    arbitrary offsets from the chunk base address. This is necessary to
    allow sparse embedding which might would need to allocate address
    ranges and memory areas which aren't aligned to unit size but
    allocation atom size (page or large page size). This also simplifies
    things a bit by removing the need to calculate offset from unit
    number.

    With this change, there's no need for the arch code to know
    pcpu_unit_size. Update pcpu_setup_first_chunk() and first chunk
    allocators to return regular 0 or -errno return code instead of unit
    size or -errno.

    Signed-off-by: Tejun Heo
    Cc: David S. Miller

    Tejun Heo
     
  • Till now, non-linear cpu->unit map was expressed using an integer
    array which maps each cpu to a unit and used only by lpage allocator.
    Although how many units have been placed in a single contiguos area
    (group) is known while building unit_map, the information is lost when
    the result is recorded into the unit_map array. For lpage allocator,
    as all allocations are done by lpages and whether two adjacent lpages
    are in the same group or not is irrelevant, this didn't cause any
    problem. Non-linear cpu->unit mapping will be used for sparse
    embedding and this grouping information is necessary for that.

    This patch introduces pcpu_alloc_info which contains all the
    information necessary for initializing percpu allocator.
    pcpu_alloc_info contains array of pcpu_group_info which describes how
    units are grouped and mapped to cpus. pcpu_group_info also has
    base_offset field to specify its offset from the chunk's base address.
    pcpu_build_alloc_info() initializes this field as if all groups are
    allocated back-to-back as is currently done but this will be used to
    sparsely place groups.

    pcpu_alloc_info is a rather complex data structure which contains a
    flexible array which in turn points to nested cpu_map arrays.

    * pcpu_alloc_alloc_info() and pcpu_free_alloc_info() are provided to
    help dealing with pcpu_alloc_info.

    * pcpu_lpage_build_unit_map() is updated to build pcpu_alloc_info,
    generalized and renamed to pcpu_build_alloc_info().
    @cpu_distance_fn may be NULL indicating that all cpus are of
    LOCAL_DISTANCE.

    * pcpul_lpage_dump_cfg() is updated to process pcpu_alloc_info,
    generalized and renamed to pcpu_dump_alloc_info(). It now also
    prints which group each alloc unit belongs to.

    * pcpu_setup_first_chunk() now takes pcpu_alloc_info instead of the
    separate parameters. All first chunk allocators are updated to use
    pcpu_build_alloc_info() to build alloc_info and call
    pcpu_setup_first_chunk() with it. This has the side effect of
    packing units for sparse possible cpus. ie. if cpus 0, 2 and 4 are
    possible, they'll be assigned unit 0, 1 and 2 instead of 0, 2 and 4.

    * x86 setup_pcpu_lpage() is updated to deal with alloc_info.

    * sparc64 setup_per_cpu_areas() is updated to build alloc_info.

    Although the changes made by this patch are pretty pervasive, it
    doesn't cause any behavior difference other than packing of sparse
    cpus. It mostly changes how information is passed among
    initialization functions and makes room for more flexibility.

    Signed-off-by: Tejun Heo
    Cc: Ingo Molnar
    Cc: David Miller

    Tejun Heo
     
  • Unit map handling will be generalized and extended and used for
    embedding sparse first chunk and other purposes. Relocate two
    unit_map related functions upward in preparation. This patch just
    moves the code without any actual change.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • pcpu_fc_alloc_fn_t is about to see more interesting usage, add @align
    parameter.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Now that all actual first chunk allocation and copying happen in the
    first chunk allocators and helpers, there's no reason for
    pcpu_setup_first_chunk() to try to determine @dyn_size automatically.
    The only left user is page first chunk allocator. Make it determine
    dyn_size like other allocators and make @dyn_size mandatory for
    pcpu_setup_first_chunk().

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • First chunk allocators assume percpu areas have been linked using one
    of PERCPU_*() macros and depend on __per_cpu_load symbol defined by
    those macros, so there isn't much point in passing in static area size
    explicitly when it can be easily calculated from __per_cpu_start and
    __per_cpu_end. Drop @static_size from all percpu first chunk
    allocators and helpers.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Now that all first chunk allocators are in mm/percpu.c, it makes sense
    to make generalize percpu_alloc kernel parameter. Define PCPU_FC_*
    and set pcpu_chosen_fc using early_param() in mm/percpu.c. Arch code
    can use the set value to determine which first chunk allocator to use.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • There's no need to build unused first chunk allocators in. Define
    CONFIG_NEED_PER_CPU_*_FIRST_CHUNK and let archs enable them
    selectively.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • Page size isn't always 4k depending on arch and configuration. Rename
    4k first chunk allocator to page.

    Signed-off-by: Tejun Heo
    Cc: David Howells

    Tejun Heo