Eric Lee / smarc-fsl-linux-kernel

13 Aug, 2020

1 commit

3c7be18ac mm: memcg/percpu: account percpu memory to memory cgroups ... Browse Code »

Percpu memory is becoming more and more widely used by various subsystems,
and the total amount of memory controlled by the percpu allocator can make
a good part of the total memory.

As an example, bpf maps can consume a lot of percpu memory, and they are
created by a user. Also, some cgroup internals (e.g. memory controller
statistics) can be quite large. On a machine with many CPUs and big
number of cgroups they can consume hundreds of megabytes.

So the lack of memcg accounting is creating a breach in the memory
isolation. Similar to the slab memory, percpu memory should be accounted
by default.

To implement the perpcu accounting it's possible to take the slab memory
accounting as a model to follow. Let's introduce two types of percpu
chunks: root and memcg. What makes memcg chunks different is an
additional space allocated to store memcg membership information. If
__GFP_ACCOUNT is passed on allocation, a memcg chunk should be be used.
If it's possible to charge the corresponding size to the target memory
cgroup, allocation is performed, and the memcg ownership data is recorded.
System-wide allocations are performed using root chunks, so there is no
additional memory overhead.

To implement a fast reparenting of percpu memory on memcg removal, we
don't store mem_cgroup pointers directly: instead we use obj_cgroup API,
introduced for slab accounting.

[akpm@linux-foundation.org: fix CONFIG_MEMCG_KMEM=n build errors and warning]
[akpm@linux-foundation.org: move unreachable code, per Roman]
[cuibixuan@huawei.com: mm/percpu: fix 'defined but not used' warning]
Link: http://lkml.kernel.org/r/6d41b939-a741-b521-a7a2-e7296ec16219@huawei.com

Signed-off-by: Roman Gushchin
Signed-off-by: Bixuan Cui
Signed-off-by: Andrew Morton
Reviewed-by: Shakeel Butt
Acked-by: Dennis Zhou
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Johannes Weiner
Cc: Joonsoo Kim
Cc: Mel Gorman
Cc: Michal Hocko
Cc: Pekka Enberg
Cc: Tejun Heo
Cc: Tobin C. Harding
Cc: Vlastimil Babka
Cc: Waiman Long
Cc: Bixuan Cui
Cc: Michal Koutný
Cc: Stephen Rothwell
Link: http://lkml.kernel.org/r/20200623184515.4132564-3-guro@fb.com
Signed-off-by: Linus Torvalds

Roman Gushchin
2020-08-13 01:57:55 +0800

05 Jun, 2019

1 commit

55716d264 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428 ... Browse Code »

Based on 1 normalized pattern(s):

this file is released under the gplv2

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 68 file(s).

Signed-off-by: Thomas Gleixner
Reviewed-by: Armijn Hemel
Reviewed-by: Allison Randal
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190114.292346262@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Thomas Gleixner
2019-06-05 23:37:16 +0800

18 Feb, 2018

2 commits

554fef1c3 percpu: allow select gfp to be passed to underlying allocators ... Browse Code »

The prior patch added support for passing gfp flags through to the
underlying allocators. This patch allows users to pass along gfp flags
(currently only __GFP_NORETRY and __GFP_NOWARN) to the underlying
allocators. This should allow users to decide if they are ok with
failing allocations recovering in a more graceful way.

Additionally, gfp passing was done as additional flags in the previous
patch. Instead, change this to caller passed semantics. GFP_KERNEL is
also removed as the default flag. It continues to be used for internally
caused underlying percpu allocations.

V2:
Removed gfp_percpu_mask in favor of doing it inline.
Removed GFP_KERNEL as a default flag for __alloc_percpu_gfp.

Signed-off-by: Dennis Zhou
Suggested-by: Daniel Borkmann
Acked-by: Christoph Lameter
Signed-off-by: Tejun Heo

Dennis Zhou
2018-02-18 21:33:01 +0800
47504ee04 percpu: add __GFP_NORETRY semantics to the percpu balancing path ... Browse Code »

Percpu memory using the vmalloc area based chunk allocator lazily
populates chunks by first requesting the full virtual address space
required for the chunk and subsequently adding pages as allocations come
through. To ensure atomic allocations can succeed, a workqueue item is
used to maintain a minimum number of empty pages. In certain scenarios,
such as reported in [1], it is possible that physical memory becomes
quite scarce which can result in either a rather long time spent trying
to find free pages or worse, a kernel panic.

This patch adds support for __GFP_NORETRY and __GFP_NOWARN passing them
through to the underlying allocators. This should prevent any
unnecessary panics potentially caused by the workqueue item. The passing
of gfp around is as additional flags rather than a full set of flags.
The next patch will change these to caller passed semantics.

V2:
Added const modifier to gfp flags in the balance path.
Removed an extra whitespace.

[1] https://lkml.org/lkml/2018/2/12/551

Signed-off-by: Dennis Zhou
Suggested-by: Daniel Borkmann
Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com
Acked-by: Christoph Lameter
Signed-off-by: Tejun Heo

Dennis Zhou
2018-02-18 21:33:00 +0800

16 Nov, 2017

1 commit

453f85d43 mm: remove __GFP_COLD ... Browse Code »

As the page free path makes no distinction between cache hot and cold
pages, there is no real useful ordering of pages in the free list that
allocation requests can take advantage of. Juding from the users of
__GFP_COLD, it is likely that a number of them are the result of copying
other sites instead of actually measuring the impact. Remove the
__GFP_COLD parameter which simplifies a number of paths in the page
allocator.

This is potentially controversial but bear in mind that the size of the
per-cpu pagelists versus modern cache sizes means that the whole per-cpu
list can often fit in the L3 cache. Hence, there is only a potential
benefit for microbenchmarks that alloc/free pages in a tight loop. It's
even worse when THP is taken into account which has little or no chance
of getting a cache-hot page as the per-cpu list is bypassed and the
zeroing of multiple pages will thrash the cache anyway.

The truncate microbenchmarks are not shown as this patch affects the
allocation path and not the free path. A page fault microbenchmark was
tested but it showed no sigificant difference which is not surprising
given that the __GFP_COLD branches are a miniscule percentage of the
fault path.

Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Andi Kleen
Cc: Dave Chinner
Cc: Dave Hansen
Cc: Jan Kara
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2017-11-16 10:21:06 +0800

29 Jun, 2017

1 commit

e3efe3db9 percpu: fix static checker warnings in pcpu_destroy_chunk ... Browse Code »

From 5021b97f4026334d2c8dfad80797dd1028cddd73 Mon Sep 17 00:00:00 2001
From: Dennis Zhou
Date: Thu, 29 Jun 2017 07:11:41 -0700

Add NULL check in pcpu_destroy_chunk to correct static checker warnings.

Signed-off-by: Dennis Zhou
Reported-by: Dan Carpenter
Signed-off-by: Tejun Heo

Dennis Zhou
2017-06-29 23:23:38 +0800

21 Jun, 2017

2 commits

df95e795a percpu: add tracepoint support for percpu memory ... Browse Code »

Add support for tracepoints to the following events: chunk allocation,
chunk free, area allocation, area free, and area allocation failure.
This should let us replay percpu memory requests and evaluate
corresponding decisions.

Signed-off-by: Dennis Zhou
Signed-off-by: Tejun Heo

Dennis Zhou
2017-06-21 03:31:43 +0800
30a5b5367 percpu: expose statistics about percpu memory via debugfs ... Browse Code »

There is limited visibility into the use of percpu memory leaving us
unable to reason about correctness of parameters and overall use of
percpu memory. These counters and statistics aim to help understand
basic statistics about percpu memory such as number of allocations over
the lifetime, allocation sizes, and fragmentation.

New Config: PERCPU_STATS

Signed-off-by: Dennis Zhou
Signed-off-by: Tejun Heo

Dennis Zhou
2017-06-21 03:31:38 +0800

07 Mar, 2017

1 commit

8a1df543d percpu: remove unused chunk_alloc parameter from pcpu_get_pages() ... Browse Code »

pcpu_get_pages() doesn't use chunk_alloc parameter, remove it.

Fixes: fbbb7f4e149f ("percpu: remove the usage of separate populated bitmap in percpu-vm")
Signed-off-by: Tahsin Erdogan
Signed-off-by: Tejun Heo

Tahsin Erdogan
2017-03-07 04:56:55 +0800

03 Sep, 2014

4 commits

a93ace487 percpu: move region iterations out of pcpu_[de]populate_chunk() ... Browse Code »

Previously, pcpu_[de]populate_chunk() were called with the range which
may contain multiple target regions in it and
pcpu_[de]populate_chunk() iterated over the regions. This has the
benefit of batching up cache flushes for all the regions; however,
we're planning to add more bookkeeping logic around [de]population to
support atomic allocations and this delegation of iterations gets in
the way.

This patch moves the region iterations out of
pcpu_[de]populate_chunk() into its callers - pcpu_alloc() and
pcpu_reclaim() - so that we can later add logic to track more states
around them. This change may make cache and tlb flushes more frequent
but multi-region [de]populations are rare anyway and if this actually
becomes a problem, it's not difficult to factor out cache flushes as
separate callbacks which are directly invoked from percpu.c.

Signed-off-by: Tejun Heo

Tejun Heo
2014-09-03 02:46:02 +0800
dca496451 percpu: move common parts out of pcpu_[de]populate_chunk() ... Browse Code »

percpu-vm and percpu-km implement separate versions of
pcpu_[de]populate_chunk() and some part which is or should be common
are currently in the specific implementations. Make the following
changes.

* Allocate area clearing is moved from the pcpu_populate_chunk()
implementations to pcpu_alloc(). This makes percpu-km's version
noop.

* Quick exit tests in pcpu_[de]populate_chunk() of percpu-vm are moved
to their respective callers so that they are applied to percpu-km
too. This doesn't make any meaningful difference as both functions
are noop for percpu-km; however, this is more consistent and will
help implementing atomic allocation support.

Signed-off-by: Tejun Heo

Tejun Heo
2014-09-03 02:46:01 +0800
cdb4cba5a percpu: remove @may_alloc from pcpu_get_pages() ... Browse Code »

pcpu_get_pages() creates the temp pages array if not already allocated
and returns the pointer to it. As the function is called from both
[de]population paths and depopulation can only happen after at least
one successful population, the param doesn't make any difference - the
allocation will always happen on the population path anyway.

Remove @may_alloc from pcpu_get_pages(). Also, add an lockdep
assertion pcpu_alloc_mutex instead of vaguely stating that the
exclusion is the caller's responsibility.

Signed-off-by: Tejun Heo

Tejun Heo
2014-09-03 02:46:01 +0800
fbbb7f4e1 percpu: remove the usage of separate populated bitmap in percpu-vm ... Browse Code »

percpu-vm uses pcpu_get_pages_and_bitmap() to acquire temp pages array
and populated bitmap and uses the two during [de]population. The temp
bitmap is used only to build the new bitmap that is copied to
chunk->populated after the operation succeeds; however, the new bitmap
can be trivially set after success without using the temp bitmap.

This patch removes the temp populated bitmap usage from percpu-vm.c.

* pcpu_get_pages_and_bitmap() is renamed to pcpu_get_pages() and no
longer hands out the temp bitmap.

* @populated arugment is dropped from all the related functions.
@populated updates in pcpu_[un]map_pages() are dropped.

* Two loops in pcpu_map_pages() are merged.

* pcpu_[de]populated_chunk() modify chunk->populated bitmap directly
from @page_start and @page_end after success.

Signed-off-by: Tejun Heo
Acked-by: Christoph Lameter

Tejun Heo
2014-09-03 02:46:01 +0800

16 Aug, 2014

2 commits

849f51690 percpu: perform tlb flush after pcpu_map_pages() failure ... Browse Code »

If pcpu_map_pages() fails midway, it unmaps the already mapped pages.
Currently, it doesn't flush tlb after the partial unmapping. This may
be okay in most cases as the established mapping hasn't been used at
that point but it can go wrong and when it goes wrong it'd be
extremely difficult to track down.

Flush tlb after the partial unmapping.

Signed-off-by: Tejun Heo
Cc: stable@vger.kernel.org

Tejun Heo
2014-08-16 04:06:10 +0800
f0d279654 percpu: fix pcpu_alloc_pages() failure path ... Browse Code »

When pcpu_alloc_pages() fails midway, pcpu_free_pages() is invoked to
free what has already been allocated. The invocation is across the
whole requested range and pcpu_free_pages() will try to free all
non-NULL pages; unfortunately, this is incorrect as
pcpu_get_pages_and_bitmap(), unlike what its comment suggests, doesn't
clear the pages array and thus the array may have entries from the
previous invocations making the partial failure path free incorrect
pages.

Fix it by open-coding the partial freeing of the already allocated
pages.

Signed-off-by: Tejun Heo
Cc: stable@vger.kernel.org

Tejun Heo
2014-08-16 04:06:06 +0800

21 Jun, 2012

1 commit

dad7557eb mm: fix kernel-doc warnings ... Browse Code »

Fix kernel-doc warnings such as

Warning(../mm/page_cgroup.c:432): No description found for parameter 'id'
Warning(../mm/page_cgroup.c:432): Excess function parameter 'mem' description in 'swap_cgroup_record'

Signed-off-by: Wanpeng Li
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wanpeng Li
2012-06-21 05:39:36 +0800

21 Jan, 2012

1 commit

26dd8e029 percpu: use bitmap_clear ... Browse Code »

Use bitmap_clear rather than clearing individual bits in a memory region.

Signed-off-by: Akinobu Mita
Acked-by: Christoph Lameter
Signed-off-by: Tejun Heo

Akinobu Mita
2012-01-21 01:23:16 +0800

23 Nov, 2011

2 commits

a855b84c3 percpu: fix chunk range calculation ... Browse Code »
1

Percpu allocator recorded the cpus which map to the first and last
units in pcpu_first/last_unit_cpu respectively and used them to
determine the address range of a chunk - e.g. it assumed that the
first unit has the lowest address in a chunk while the last unit has
the highest address.

This simply isn't true. Groups in a chunk can have arbitrary positive
or negative offsets from the previous one and there is no guarantee
that the first unit occupies the lowest offset while the last one the
highest.

Fix it by actually comparing unit offsets to determine cpus occupying
the lowest and highest offsets. Also, rename pcu_first/last_unit_cpu
to pcpu_low/high_unit_cpu to avoid confusion.

The chunk address range is used to flush cache on vmalloc area
map/unmap and decide whether a given address is in the first chunk by
per_cpu_ptr_to_phys() and the bug was discovered by invalid
per_cpu_ptr_to_phys() translation for crash_note.

Kudos to Dave Young for tracking down the problem.

Signed-off-by: Tejun Heo
Reported-by: WANG Cong
Reported-by: Dave Young
Tested-by: Dave Young
LKML-Reference:
Cc: stable @kernel.org

Tejun Heo
2011-11-23 00:09:46 +0800
90459ce06 percpu: rename pcpu_mem_alloc to pcpu_mem_zalloc ... Browse Code »

Currently pcpu_mem_alloc() is implemented always return zeroed memory.
So rename it to make user like pcpu_get_pages_and_bitmap() know don't
reinit it.

Signed-off-by: Bob Liu
Reviewed-by: Pekka Enberg
Reviewed-by: Michal Hocko
Signed-off-by: Tejun Heo

Bob Liu
2011-11-23 00:09:41 +0800

14 Jan, 2011

1 commit

ec3f64fc9 mm: remove gfp mask from pcpu_get_vm_areas ... Browse Code »

pcpu_get_vm_areas() only uses GFP_KERNEL allocations, so remove the gfp_t
formal and use the mask internally.

Signed-off-by: David Rientjes
Cc: Christoph Lameter
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-01-14 09:32:34 +0800

01 May, 2010

1 commit

9f6455325 percpu: move vmalloc based chunk management into percpu-vm.c ... Browse Code »

Separate out and move chunk management (creation/desctruction and
[de]population) code into percpu-vm.c which is included by percpu.c
and compiled together. The interface for chunk management is defined
as follows.

* pcpu_populate_chunk - populate the specified range of a chunk
* pcpu_depopulate_chunk - depopulate the specified range of a chunk
* pcpu_create_chunk - create a new chunk
* pcpu_destroy_chunk - destroy a chunk, always preceded by full depop
* pcpu_addr_to_page - translate address to physical address
* pcpu_verify_alloc_info - check alloc_info is acceptable during init

Other than wrapping vmalloc_to_page() inside pcpu_addr_to_page() and
dummy pcpu_verify_alloc_info() implementation, this patch only moves
code around. This separation is to allow alternate chunk management
implementation.

Signed-off-by: Tejun Heo
Reviewed-by: David Howells
Cc: Graff Yang
Cc: Sonic Zhang

Tejun Heo
2010-05-01 14:30:50 +0800