Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

03 Sep, 2014

5 commits

b539b87fe percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated ... Browse Code »
2

pcpu_nr_empty_pop_pages counts the number of empty populated pages
across all chunks and chunk->nr_populated counts the number of
populated pages in a chunk. Both will be used to implement pre/async
population for atomic allocations.

pcpu_chunk_[de]populated() are added to update chunk->populated,
chunk->nr_populated and pcpu_nr_empty_pop_pages together. All
successful chunk [de]populations should be followed by the
corresponding pcpu_chunk_[de]populated() calls.

Signed-off-by: Tejun Heo

Tejun Heo
2014-09-03 02:46:05 +0800
b38d08f31 percpu: restructure locking ... Browse Code »
13

At first, the percpu allocator required a sleepable context for both
alloc and free paths and used pcpu_alloc_mutex to protect everything.
Later, pcpu_lock was introduced to protect the index data structure so
that the free path can be invoked from atomic contexts. The
conversion only updated what's necessary and left most of the
allocation path under pcpu_alloc_mutex.

The percpu allocator is planned to add support for atomic allocation
and this patch restructures locking so that the coverage of
pcpu_alloc_mutex is further reduced.

* pcpu_alloc() now grab pcpu_alloc_mutex only while creating a new
chunk and populating the allocated area. Everything else is now
protected soley by pcpu_lock.

After this change, multiple instances of pcpu_extend_area_map() may
race but the function already implements sufficient synchronization
using pcpu_lock.

This also allows multiple allocators to arrive at new chunk
creation. To avoid creating multiple empty chunks back-to-back, a
new chunk is created iff there is no other empty chunk after
grabbing pcpu_alloc_mutex.

* pcpu_lock is now held while modifying chunk->populated bitmap.
After this, all data structures are protected by pcpu_lock.

Signed-off-by: Tejun Heo

Tejun Heo
2014-09-03 02:46:02 +0800
a63d4ac4a percpu: make percpu-km set chunk->populated bitmap properly ... Browse Code »

percpu-km instantiates the whole chunk on creation and doesn't make
use of chunk->populated bitmap and leaves it as zero. While this
currently doesn't cause any problem, the inconsistency makes it
difficult to build further logic on top of chunk->populated. This
patch makes percpu-km fill chunk->populated on creation so that the
bitmap is always consistent.

Signed-off-by: Tejun Heo
Acked-by: Christoph Lameter

Tejun Heo
2014-09-03 02:46:02 +0800
a93ace487 percpu: move region iterations out of pcpu_[de]populate_chunk() ... Browse Code »

Previously, pcpu_[de]populate_chunk() were called with the range which
may contain multiple target regions in it and
pcpu_[de]populate_chunk() iterated over the regions. This has the
benefit of batching up cache flushes for all the regions; however,
we're planning to add more bookkeeping logic around [de]population to
support atomic allocations and this delegation of iterations gets in
the way.

This patch moves the region iterations out of
pcpu_[de]populate_chunk() into its callers - pcpu_alloc() and
pcpu_reclaim() - so that we can later add logic to track more states
around them. This change may make cache and tlb flushes more frequent
but multi-region [de]populations are rare anyway and if this actually
becomes a problem, it's not difficult to factor out cache flushes as
separate callbacks which are directly invoked from percpu.c.

Signed-off-by: Tejun Heo

Tejun Heo
2014-09-03 02:46:02 +0800
dca496451 percpu: move common parts out of pcpu_[de]populate_chunk() ... Browse Code »

percpu-vm and percpu-km implement separate versions of
pcpu_[de]populate_chunk() and some part which is or should be common
are currently in the specific implementations. Make the following
changes.

* Allocate area clearing is moved from the pcpu_populate_chunk()
implementations to pcpu_alloc(). This makes percpu-km's version
noop.

* Quick exit tests in pcpu_[de]populate_chunk() of percpu-vm are moved
to their respective callers so that they are applied to percpu-km
too. This doesn't make any meaningful difference as both functions
are noop for percpu-km; however, this is more consistent and will
help implementing atomic allocation support.

Signed-off-by: Tejun Heo

Tejun Heo
2014-09-03 02:46:01 +0800

10 Sep, 2010

1 commit

fc1481a95 percpu: clear memory allocated with the km allocator ... Browse Code »

Percpu allocator should clear memory before returning it but the km
allocator forgot to do it. Fix it.

Signed-off-by: Tejun Heo
Reported-by: Peter Zijlstra
Acked-by: Peter Zijlstra

Tejun Heo
2010-09-10 16:56:24 +0800

08 Sep, 2010

1 commit

bbddff054 percpu: use percpu allocator on UP too ... Browse Code »

On UP, percpu allocations were redirected to kmalloc. This has the
following problems.

* For certain amount of allocations (determined by
PERCPU_DYNAMIC_EARLY_SLOTS and PERCPU_DYNAMIC_EARLY_SIZE), percpu
allocator can be used before the usual kernel memory allocator is
brought online. On SMP, this is used to initialize the kernel
memory allocator.

* percpu allocator honors alignment upto PAGE_SIZE but kmalloc()
doesn't. For example, workqueue makes use of larger alignments for
cpu_workqueues.

Currently, users of percpu allocators need to handle UP differently,
which is somewhat fragile and ugly. Other than small amount of
memory, there isn't much to lose by enabling percpu allocator on UP.
It can simply use kernel memory based chunk allocation which was added
for SMP archs w/o MMUs.

This patch removes mm/percpu_up.c, builds mm/percpu.c on UP too and
makes UP build use percpu-km. As percpu addresses and kernel
addresses are always identity mapped and static percpu variables don't
need any special treatment, nothing is arch dependent and mm/percpu.c
implements generic setup_per_cpu_areas() for UP.

Signed-off-by: Tejun Heo
Reviewed-by: Christoph Lameter
Acked-by: Pekka Enberg

Tejun Heo
2010-09-08 17:11:23 +0800

01 May, 2010

1 commit

b0c9778b1 percpu: implement kernel memory based chunk allocation ... Browse Code »

Implement an alternate percpu chunk management based on kernel memeory
for nommu SMP architectures. Instead of mapping into vmalloc area,
chunks are allocated as a contiguous kernel memory using
alloc_pages(). As such, percpu allocator on nommu will have the
following restrictions.

* It can't fill chunks on-demand page-by-page. It has to allocate
each chunk fully upfront.

* It can't support sparse chunk for NUMA configurations. SMP w/o mmu
is crazy enough. Let's hope no one does NUMA w/o mmu. :-P

* If chunk size isn't power-of-two multiple of PAGE_SIZE, the
unaligned amount will be wasted on each chunk. So, archs which use
this better align chunk size.

For instructions on how to use this, read the comment on top of
mm/percpu-km.c.

Signed-off-by: Tejun Heo
Reviewed-by: David Howells
Cc: Graff Yang
Cc: Sonic Zhang

Tejun Heo
2010-05-01 14:30:50 +0800