21 May, 2016
5 commits
-
wait_iff_congested has been used to throttle allocator before it retried
another round of direct reclaim to allow the writeback to make some
progress and prevent reclaim from looping over dirty/writeback pages
without making any progress.We used to do congestion_wait before commit 0e093d99763e ("writeback: do
not sleep on the congestion queue if there are no congested BDIs or if
significant congestion is not being encountered in the current zone")
but that led to undesirable stalls and sleeping for the full timeout
even when the BDI wasn't congested. Hence wait_iff_congested was used
instead.But it seems that even wait_iff_congested doesn't work as expected. We
might have a small file LRU list with all pages dirty/writeback and yet
the bdi is not congested so this is just a cond_resched in the end and
can end up triggering pre mature OOM.This patch replaces the unconditional wait_iff_congested by
congestion_wait which is executed only if we _know_ that the last round
of direct reclaim didn't make any progress and dirty+writeback pages are
more than a half of the reclaimable pages on the zone which might be
usable for our target allocation. This shouldn't reintroduce stalls
fixed by 0e093d99763e because congestion_wait is called only when we are
getting hopeless when sleeping is a better choice than OOM with many
pages under IO.We have to preserve logic introduced by commit 373ccbe59270 ("mm,
vmstat: allow WQ concurrency to discover memory reclaim doesn't make any
progress") into the __alloc_pages_slowpath now that wait_iff_congested
is not used anymore. As the only remaining user of wait_iff_congested
is shrink_inactive_list we can remove the WQ specific short sleep from
wait_iff_congested because the sleep is needed to be done only once in
the allocation retry cycle.[mhocko@suse.com: high_zoneidx->ac_classzone_idx to evaluate memory reserves properly]
Link: http://lkml.kernel.org/r/1463051677-29418-2-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko
Acked-by: Hillf Danton
Cc: David Rientjes
Cc: Johannes Weiner
Cc: Joonsoo Kim
Cc: Mel Gorman
Cc: Tetsuo Handa
Cc: Vladimir Davydov
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
__alloc_pages_slowpath has traditionally relied on the direct reclaim
and did_some_progress as an indicator that it makes sense to retry
allocation rather than declaring OOM. shrink_zones had to rely on
zone_reclaimable if shrink_zone didn't make any progress to prevent from
a premature OOM killer invocation - the LRU might be full of dirty or
writeback pages and direct reclaim cannot clean those up.zone_reclaimable allows to rescan the reclaimable lists several times
and restart if a page is freed. This is really subtle behavior and it
might lead to a livelock when a single freed page keeps allocator
looping but the current task will not be able to allocate that single
page. OOM killer would be more appropriate than looping without any
progress for unbounded amount of time.This patch changes OOM detection logic and pulls it out from shrink_zone
which is too low to be appropriate for any high level decisions such as
OOM which is per zonelist property. It is __alloc_pages_slowpath which
knows how many attempts have been done and what was the progress so far
therefore it is more appropriate to implement this logic.The new heuristic is implemented in should_reclaim_retry helper called
from __alloc_pages_slowpath. It tries to be more deterministic and
easier to follow. It builds on an assumption that retrying makes sense
only if the currently reclaimable memory + free pages would allow the
current allocation request to succeed (as per __zone_watermark_ok) at
least for one zone in the usable zonelist.This alone wouldn't be sufficient, though, because the writeback might
get stuck and reclaimable pages might be pinned for a really long time
or even depend on the current allocation context. Therefore there is a
backoff mechanism implemented which reduces the reclaim target after
each reclaim round without any progress. This means that we should
eventually converge to only NR_FREE_PAGES as the target and fail on the
wmark check and proceed to OOM. The backoff is simple and linear with
1/16 of the reclaimable pages for each round without any progress. We
are optimistic and reset counter for successful reclaim rounds.Costly high order pages mostly preserve their semantic and those without
__GFP_REPEAT fail right away while those which have the flag set will
back off after the amount of reclaimable pages reaches equivalent of the
requested order. The only difference is that if there was no progress
during the reclaim we rely on zone watermark check. This is more
logical thing to do than previous 1<
Acked-by: Hillf Danton
Cc: Vladimir Davydov
Cc: Johannes Weiner
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Mel Gorman
Cc: Tetsuo Handa
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
__alloc_pages_direct_compact communicates potential back off by two
variables:
- deferred_compaction tells that the compaction returned
COMPACT_DEFERRED
- contended_compaction is set when there is a contention on
zone->lock resp. zone->lru_lock locks__alloc_pages_slowpath then backs of for THP allocation requests to
prevent from long stalls. This is rather messy and it would be much
cleaner to return a single compact result value and hide all the nasty
details into __alloc_pages_direct_compact.This patch shouldn't introduce any functional changes.
Signed-off-by: Michal Hocko
Acked-by: Vlastimil Babka
Acked-by: Hillf Danton
Cc: David Rientjes
Cc: Johannes Weiner
Cc: Joonsoo Kim
Cc: Mel Gorman
Cc: Tetsuo Handa
Cc: Vladimir Davydov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Compaction code is doing weird dances between COMPACT_FOO -> int ->
unsigned longBut there doesn't seem to be any reason for that. All functions which
return/use one of those constants are not expecting any other value so it
really makes sense to define an enum for them and make it clear that no
other values are expected.This is a pure cleanup and shouldn't introduce any functional changes.
Signed-off-by: Michal Hocko
Acked-by: Vlastimil Babka
Acked-by: Hillf Danton
Cc: David Rientjes
Cc: Johannes Weiner
Cc: Joonsoo Kim
Cc: Mel Gorman
Cc: Tetsuo Handa
Cc: Vladimir Davydov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The inactive file list should still be large enough to contain readahead
windows and freshly written file data, but it no longer is the only
source for detecting multiple accesses to file pages. The workingset
refault measurement code causes recently evicted file pages that get
accessed again after a shorter interval to be promoted directly to the
active list.With that mechanism in place, we can afford to (on a larger system)
dedicate more memory to the active file list, so we can actually cache
more of the frequently used file pages in memory, and not have them
pushed out by streaming writes, once-used streaming file reads, etc.This can help things like database workloads, where only half the page
cache can currently be used to cache the database working set. This
patch automatically increases that fraction on larger systems, using the
same ratio that has already been used for anonymous memory.[hannes@cmpxchg.org: cgroup-awareness]
Signed-off-by: Rik van Riel
Signed-off-by: Johannes Weiner
Reported-by: Andres Freund
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 May, 2016
35 commits
-
The page allocator fast path uses either the requested nodemask or
cpuset_current_mems_allowed if cpusets are enabled. If the allocation
context allows watermarks to be ignored then it can also ignore memory
policies. However, on entering the allocator slowpath the nodemask may
still be cpuset_current_mems_allowed and the policies are enforced.
This patch resets the nodemask appropriately before entering the
slowpath.Link: http://lkml.kernel.org/r/20160504143628.GU2858@techsingularity.net
Signed-off-by: Vlastimil Babka
Signed-off-by: Mel Gorman
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Bad pages should be rare so the code handling them doesn't need to be
inline for performance reasons. Put it to separate function which
returns void. This also assumes that the initial page_expected_state()
result will match the result of the thorough check, i.e. the page
doesn't become "good" in the meanwhile. This matches the same
expectations already in place in free_pages_check().!DEBUG_VM bloat-o-meter:
add/remove: 1/0 grow/shrink: 0/1 up/down: 134/-274 (-140)
function old new delta
check_new_page_bad - 134 +134
get_page_from_freelist 3468 3194 -274Signed-off-by: Vlastimil Babka
Acked-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The new free_pcp_prepare() function shares a lot of code with
free_pages_prepare(), which makes this a maintenance risk when some
future patch modifies only one of them. We should be able to achieve
the same effect (skipping free_pages_check() from !DEBUG_VM configs) by
adding a parameter to free_pages_prepare() and making it inline, so the
checks (and the order != 0 parts) are eliminated from the call from
free_pcp_prepare().!DEBUG_VM: bloat-o-meter reports no difference, as my gcc was already
inlining free_pages_prepare() and the elimination seems to work as
expectedDEBUG_VM bloat-o-meter:
add/remove: 0/1 grow/shrink: 2/0 up/down: 1035/-778 (257)
function old new delta
__free_pages_ok 297 1060 +763
free_hot_cold_page 480 752 +272
free_pages_prepare 778 - -778Here inlining didn't occur before, and added some code, but it's ok for
a debug option.[akpm@linux-foundation.org: fix build]
Signed-off-by: Vlastimil Babka
Signed-off-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Every page allocated checks a number of page fields for validity. This
catches corruption bugs of pages that are already freed but it is
expensive. This patch weakens the debugging check by checking PCP pages
only when the PCP lists are being refilled. All compound pages are
checked. This potentially avoids debugging checks entirely if the PCP
lists are never emptied and refilled so some corruption issues may be
missed. Full checking requires DEBUG_VM.With the two deferred debugging patches applied, the impact to a page
allocator microbenchmark is4.6.0-rc3 4.6.0-rc3
inline-v3r6 deferalloc-v3r7
Min alloc-odr0-1 344.00 ( 0.00%) 317.00 ( 7.85%)
Min alloc-odr0-2 248.00 ( 0.00%) 231.00 ( 6.85%)
Min alloc-odr0-4 209.00 ( 0.00%) 192.00 ( 8.13%)
Min alloc-odr0-8 181.00 ( 0.00%) 166.00 ( 8.29%)
Min alloc-odr0-16 168.00 ( 0.00%) 154.00 ( 8.33%)
Min alloc-odr0-32 161.00 ( 0.00%) 148.00 ( 8.07%)
Min alloc-odr0-64 158.00 ( 0.00%) 145.00 ( 8.23%)
Min alloc-odr0-128 156.00 ( 0.00%) 143.00 ( 8.33%)
Min alloc-odr0-256 168.00 ( 0.00%) 154.00 ( 8.33%)
Min alloc-odr0-512 178.00 ( 0.00%) 167.00 ( 6.18%)
Min alloc-odr0-1024 186.00 ( 0.00%) 174.00 ( 6.45%)
Min alloc-odr0-2048 192.00 ( 0.00%) 180.00 ( 6.25%)
Min alloc-odr0-4096 198.00 ( 0.00%) 184.00 ( 7.07%)
Min alloc-odr0-8192 200.00 ( 0.00%) 188.00 ( 6.00%)
Min alloc-odr0-16384 201.00 ( 0.00%) 188.00 ( 6.47%)
Min free-odr0-1 189.00 ( 0.00%) 180.00 ( 4.76%)
Min free-odr0-2 132.00 ( 0.00%) 126.00 ( 4.55%)
Min free-odr0-4 104.00 ( 0.00%) 99.00 ( 4.81%)
Min free-odr0-8 90.00 ( 0.00%) 85.00 ( 5.56%)
Min free-odr0-16 84.00 ( 0.00%) 80.00 ( 4.76%)
Min free-odr0-32 80.00 ( 0.00%) 76.00 ( 5.00%)
Min free-odr0-64 78.00 ( 0.00%) 74.00 ( 5.13%)
Min free-odr0-128 77.00 ( 0.00%) 73.00 ( 5.19%)
Min free-odr0-256 94.00 ( 0.00%) 91.00 ( 3.19%)
Min free-odr0-512 108.00 ( 0.00%) 112.00 ( -3.70%)
Min free-odr0-1024 115.00 ( 0.00%) 118.00 ( -2.61%)
Min free-odr0-2048 120.00 ( 0.00%) 125.00 ( -4.17%)
Min free-odr0-4096 123.00 ( 0.00%) 129.00 ( -4.88%)
Min free-odr0-8192 126.00 ( 0.00%) 130.00 ( -3.17%)
Min free-odr0-16384 126.00 ( 0.00%) 131.00 ( -3.97%)Note that the free paths for large numbers of pages is impacted as the
debugging cost gets shifted into that path when the page data is no
longer necessarily cache-hot.Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Every page free checks a number of page fields for validity. This
catches premature frees and corruptions but it is also expensive. This
patch weakens the debugging check by checking PCP pages at the time they
are drained from the PCP list. This will trigger the bug but the site
that freed the corrupt page will be lost. To get the full context, a
kernel rebuild with DEBUG_VM is necessary.[akpm@linux-foundation.org: fix build]
Signed-off-by: Mel Gorman
Cc: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
An important function for cpusets is cpuset_node_allowed(), which
optimizes on the fact if there's a single root CPU set, it must be
trivially allowed. But the check "nr_cpusets()
Signed-off-by: Mel Gorman
Acked-by: Zefan Li
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The function call overhead of get_pfnblock_flags_mask() is measurable in
the page free paths. This patch uses an inlined version that is faster.Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The original count is never reused so it can be removed.
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Check without side-effects should be easier to maintain. It also
removes the duplicated cpupid and flags reset done in !DEBUG_VM variant
of both free_pcp_prepare() and then bulkfree_pcp_prepare(). Finally, it
enables the next patch.It shouldn't result in new branches, thanks to inlining of the check.
!DEBUG_VM bloat-o-meter:
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-27 (-27)
function old new delta
__free_pages_ok 748 739 -9
free_pcppages_bulk 1403 1385 -18DEBUG_VM:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-28 (-28)
function old new delta
free_pages_prepare 806 778 -28This is also slightly faster because cpupid information is not set on
tail pages so we can avoid resets there.Signed-off-by: Vlastimil Babka
Signed-off-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
From: Vlastimil Babka
!DEBUG_VM size and bloat-o-meter:
add/remove: 1/0 grow/shrink: 0/2 up/down: 124/-370 (-246)
function old new delta
free_pages_check_bad - 124 +124
free_pcppages_bulk 1288 1171 -117
__free_pages_ok 948 695 -253DEBUG_VM:
add/remove: 1/0 grow/shrink: 0/1 up/down: 124/-214 (-90)
function old new delta
free_pages_check_bad - 124 +124
free_pages_prepare 1112 898 -214[akpm@linux-foundation.org: fix whitespace]
Signed-off-by: Vlastimil Babka
Signed-off-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Every page allocated or freed is checked for sanity to avoid corruptions
that are difficult to detect later. A bad page could be due to a number
of fields. Instead of using multiple branches, this patch combines
multiple fields into a single branch. A detailed check is only
necessary if that check fails.4.6.0-rc2 4.6.0-rc2
initonce-v1r20 multcheck-v1r20
Min alloc-odr0-1 359.00 ( 0.00%) 348.00 ( 3.06%)
Min alloc-odr0-2 260.00 ( 0.00%) 254.00 ( 2.31%)
Min alloc-odr0-4 214.00 ( 0.00%) 213.00 ( 0.47%)
Min alloc-odr0-8 186.00 ( 0.00%) 186.00 ( 0.00%)
Min alloc-odr0-16 173.00 ( 0.00%) 173.00 ( 0.00%)
Min alloc-odr0-32 165.00 ( 0.00%) 166.00 ( -0.61%)
Min alloc-odr0-64 162.00 ( 0.00%) 162.00 ( 0.00%)
Min alloc-odr0-128 161.00 ( 0.00%) 160.00 ( 0.62%)
Min alloc-odr0-256 170.00 ( 0.00%) 169.00 ( 0.59%)
Min alloc-odr0-512 181.00 ( 0.00%) 180.00 ( 0.55%)
Min alloc-odr0-1024 190.00 ( 0.00%) 188.00 ( 1.05%)
Min alloc-odr0-2048 196.00 ( 0.00%) 194.00 ( 1.02%)
Min alloc-odr0-4096 202.00 ( 0.00%) 199.00 ( 1.49%)
Min alloc-odr0-8192 205.00 ( 0.00%) 202.00 ( 1.46%)
Min alloc-odr0-16384 205.00 ( 0.00%) 203.00 ( 0.98%)Again, the benefit is marginal but avoiding excessive branches is
important. Ideally the paths would not have to check these conditions
at all but regrettably abandoning the tests would make use-after-free
bugs much harder to detect.Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The classzone_idx can be inferred from preferred_zoneref so remove the
unnecessary field and save stack space.Signed-off-by: Mel Gorman
Cc: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The allocator fast path looks up the first usable zone in a zonelist and
then get_page_from_freelist does the same job in the zonelist iterator.
This patch preserves the necessary information.4.6.0-rc2 4.6.0-rc2
fastmark-v1r20 initonce-v1r20
Min alloc-odr0-1 364.00 ( 0.00%) 359.00 ( 1.37%)
Min alloc-odr0-2 262.00 ( 0.00%) 260.00 ( 0.76%)
Min alloc-odr0-4 214.00 ( 0.00%) 214.00 ( 0.00%)
Min alloc-odr0-8 186.00 ( 0.00%) 186.00 ( 0.00%)
Min alloc-odr0-16 173.00 ( 0.00%) 173.00 ( 0.00%)
Min alloc-odr0-32 165.00 ( 0.00%) 165.00 ( 0.00%)
Min alloc-odr0-64 161.00 ( 0.00%) 162.00 ( -0.62%)
Min alloc-odr0-128 159.00 ( 0.00%) 161.00 ( -1.26%)
Min alloc-odr0-256 168.00 ( 0.00%) 170.00 ( -1.19%)
Min alloc-odr0-512 180.00 ( 0.00%) 181.00 ( -0.56%)
Min alloc-odr0-1024 190.00 ( 0.00%) 190.00 ( 0.00%)
Min alloc-odr0-2048 196.00 ( 0.00%) 196.00 ( 0.00%)
Min alloc-odr0-4096 202.00 ( 0.00%) 202.00 ( 0.00%)
Min alloc-odr0-8192 206.00 ( 0.00%) 205.00 ( 0.49%)
Min alloc-odr0-16384 206.00 ( 0.00%) 205.00 ( 0.49%)The benefit is negligible and the results are within the noise but each
cycle counts.Signed-off-by: Mel Gorman
Cc: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Watermarks have to be checked on every allocation including the number
of pages being allocated and whether reserves can be accessed. The
reserves only matter if memory is limited and the free_pages adjustment
only applies to high-order pages. This patch adds a shortcut for
order-0 pages that avoids numerous calculations if there is plenty of
free memory yielding the following performance difference in a page
allocator microbenchmark;4.6.0-rc2 4.6.0-rc2
optfair-v1r20 fastmark-v1r20
Min alloc-odr0-1 380.00 ( 0.00%) 364.00 ( 4.21%)
Min alloc-odr0-2 273.00 ( 0.00%) 262.00 ( 4.03%)
Min alloc-odr0-4 227.00 ( 0.00%) 214.00 ( 5.73%)
Min alloc-odr0-8 196.00 ( 0.00%) 186.00 ( 5.10%)
Min alloc-odr0-16 183.00 ( 0.00%) 173.00 ( 5.46%)
Min alloc-odr0-32 173.00 ( 0.00%) 165.00 ( 4.62%)
Min alloc-odr0-64 169.00 ( 0.00%) 161.00 ( 4.73%)
Min alloc-odr0-128 169.00 ( 0.00%) 159.00 ( 5.92%)
Min alloc-odr0-256 180.00 ( 0.00%) 168.00 ( 6.67%)
Min alloc-odr0-512 190.00 ( 0.00%) 180.00 ( 5.26%)
Min alloc-odr0-1024 198.00 ( 0.00%) 190.00 ( 4.04%)
Min alloc-odr0-2048 204.00 ( 0.00%) 196.00 ( 3.92%)
Min alloc-odr0-4096 209.00 ( 0.00%) 202.00 ( 3.35%)
Min alloc-odr0-8192 213.00 ( 0.00%) 206.00 ( 3.29%)
Min alloc-odr0-16384 214.00 ( 0.00%) 206.00 ( 3.74%)Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The fair zone allocation policy is not without cost but it can be
reduced slightly. This patch removes an unnecessary local variable,
checks the likely conditions of the fair zone policy first, uses a bool
instead of a flags check and falls through when a remote node is
encountered instead of doing a full restart. The benefit is marginal
but it's there4.6.0-rc2 4.6.0-rc2
decstat-v1r20 optfair-v1r20
Min alloc-odr0-1 377.00 ( 0.00%) 380.00 ( -0.80%)
Min alloc-odr0-2 273.00 ( 0.00%) 273.00 ( 0.00%)
Min alloc-odr0-4 226.00 ( 0.00%) 227.00 ( -0.44%)
Min alloc-odr0-8 196.00 ( 0.00%) 196.00 ( 0.00%)
Min alloc-odr0-16 183.00 ( 0.00%) 183.00 ( 0.00%)
Min alloc-odr0-32 175.00 ( 0.00%) 173.00 ( 1.14%)
Min alloc-odr0-64 172.00 ( 0.00%) 169.00 ( 1.74%)
Min alloc-odr0-128 170.00 ( 0.00%) 169.00 ( 0.59%)
Min alloc-odr0-256 183.00 ( 0.00%) 180.00 ( 1.64%)
Min alloc-odr0-512 191.00 ( 0.00%) 190.00 ( 0.52%)
Min alloc-odr0-1024 199.00 ( 0.00%) 198.00 ( 0.50%)
Min alloc-odr0-2048 204.00 ( 0.00%) 204.00 ( 0.00%)
Min alloc-odr0-4096 210.00 ( 0.00%) 209.00 ( 0.48%)
Min alloc-odr0-8192 213.00 ( 0.00%) 213.00 ( 0.00%)
Min alloc-odr0-16384 214.00 ( 0.00%) 214.00 ( 0.00%)The benefit is marginal at best but one of the most important benefits,
avoiding a second search when falling back to another node is not
triggered by this particular test so the benefit for some corner cases
is understated.Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The page allocator fast path checks page multiple times unnecessarily.
This patch avoids all the slowpath checks if the first allocation
attempt succeeds.Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When bulk freeing pages from the per-cpu lists the zone is checked for
isolated pageblocks on every release. This patch checks it once per
drain.[mgorman@techsingularity.net: fix locking radce, per Vlastimil]
Signed-off-by: Mel Gorman
Signed-off-by: Vlastimil Babka
Cc: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
__GFP_HARDWALL only has meaning in the context of cpusets but the fast
path always applies the flag on the first attempt. Move the
manipulations into the cpuset paths where they will be masked by a
static branch in the common case.With the other micro-optimisations in this series combined, the impact
on a page allocator microbenchmark is4.6.0-rc2 4.6.0-rc2
decstat-v1r20 micro-v1r20
Min alloc-odr0-1 381.00 ( 0.00%) 377.00 ( 1.05%)
Min alloc-odr0-2 275.00 ( 0.00%) 273.00 ( 0.73%)
Min alloc-odr0-4 229.00 ( 0.00%) 226.00 ( 1.31%)
Min alloc-odr0-8 199.00 ( 0.00%) 196.00 ( 1.51%)
Min alloc-odr0-16 186.00 ( 0.00%) 183.00 ( 1.61%)
Min alloc-odr0-32 179.00 ( 0.00%) 175.00 ( 2.23%)
Min alloc-odr0-64 174.00 ( 0.00%) 172.00 ( 1.15%)
Min alloc-odr0-128 172.00 ( 0.00%) 170.00 ( 1.16%)
Min alloc-odr0-256 181.00 ( 0.00%) 183.00 ( -1.10%)
Min alloc-odr0-512 193.00 ( 0.00%) 191.00 ( 1.04%)
Min alloc-odr0-1024 201.00 ( 0.00%) 199.00 ( 1.00%)
Min alloc-odr0-2048 206.00 ( 0.00%) 204.00 ( 0.97%)
Min alloc-odr0-4096 212.00 ( 0.00%) 210.00 ( 0.94%)
Min alloc-odr0-8192 215.00 ( 0.00%) 213.00 ( 0.93%)
Min alloc-odr0-16384 216.00 ( 0.00%) 214.00 ( 0.93%)Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
page is guaranteed to be set before it is read with or without the
initialisation.[akpm@linux-foundation.org: fix warning]
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
zonelist here is a copy of a struct field that is used once. Ditch it.
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The number of zones skipped to a zone expiring its fair zone allocation
quota is irrelevant. Convert to bool.Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
alloc_flags is a bitmask of flags but it is signed which does not
necessarily generate the best code depending on the compiler. Even
without an impact, it makes more sense that this be unsigned.Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Pageblocks have an associated bitmap to store migrate types and whether
the pageblock should be skipped during compaction. The bitmap may be
associated with a memory section or a zone but the zone is looked up
unconditionally. The compiler should optimise this away automatically
so this is a cosmetic patch only in many cases.Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
__dec_zone_state is cheaper to use for removing an order-0 page as it
has fewer conditions to check.The performance difference on a page allocator microbenchmark is;
4.6.0-rc2 4.6.0-rc2
optiter-v1r20 decstat-v1r20
Min alloc-odr0-1 382.00 ( 0.00%) 381.00 ( 0.26%)
Min alloc-odr0-2 282.00 ( 0.00%) 275.00 ( 2.48%)
Min alloc-odr0-4 233.00 ( 0.00%) 229.00 ( 1.72%)
Min alloc-odr0-8 203.00 ( 0.00%) 199.00 ( 1.97%)
Min alloc-odr0-16 188.00 ( 0.00%) 186.00 ( 1.06%)
Min alloc-odr0-32 182.00 ( 0.00%) 179.00 ( 1.65%)
Min alloc-odr0-64 177.00 ( 0.00%) 174.00 ( 1.69%)
Min alloc-odr0-128 175.00 ( 0.00%) 172.00 ( 1.71%)
Min alloc-odr0-256 184.00 ( 0.00%) 181.00 ( 1.63%)
Min alloc-odr0-512 197.00 ( 0.00%) 193.00 ( 2.03%)
Min alloc-odr0-1024 203.00 ( 0.00%) 201.00 ( 0.99%)
Min alloc-odr0-2048 209.00 ( 0.00%) 206.00 ( 1.44%)
Min alloc-odr0-4096 214.00 ( 0.00%) 212.00 ( 0.93%)
Min alloc-odr0-8192 218.00 ( 0.00%) 215.00 ( 1.38%)
Min alloc-odr0-16384 219.00 ( 0.00%) 216.00 ( 1.37%)Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The page allocator iterates through a zonelist for zones that match the
addressing limitations and nodemask of the caller but many allocations
will not be restricted. Despite this, there is always functional call
overhead which builds up.This patch inlines the optimistic basic case and only calls the iterator
function for the complex case. A hindrance was the fact that
cpuset_current_mems_allowed is used in the fastpath as the allowed
nodemask even though all nodes are allowed on most systems. The patch
handles this by only considering cpuset_current_mems_allowed if a cpuset
exists. As well as being faster in the fast-path, this removes some
junk in the slowpath.The performance difference on a page allocator microbenchmark is;
4.6.0-rc2 4.6.0-rc2
statinline-v1r20 optiter-v1r20
Min alloc-odr0-1 412.00 ( 0.00%) 382.00 ( 7.28%)
Min alloc-odr0-2 301.00 ( 0.00%) 282.00 ( 6.31%)
Min alloc-odr0-4 247.00 ( 0.00%) 233.00 ( 5.67%)
Min alloc-odr0-8 215.00 ( 0.00%) 203.00 ( 5.58%)
Min alloc-odr0-16 199.00 ( 0.00%) 188.00 ( 5.53%)
Min alloc-odr0-32 191.00 ( 0.00%) 182.00 ( 4.71%)
Min alloc-odr0-64 187.00 ( 0.00%) 177.00 ( 5.35%)
Min alloc-odr0-128 185.00 ( 0.00%) 175.00 ( 5.41%)
Min alloc-odr0-256 193.00 ( 0.00%) 184.00 ( 4.66%)
Min alloc-odr0-512 207.00 ( 0.00%) 197.00 ( 4.83%)
Min alloc-odr0-1024 213.00 ( 0.00%) 203.00 ( 4.69%)
Min alloc-odr0-2048 220.00 ( 0.00%) 209.00 ( 5.00%)
Min alloc-odr0-4096 226.00 ( 0.00%) 214.00 ( 5.31%)
Min alloc-odr0-8192 229.00 ( 0.00%) 218.00 ( 4.80%)
Min alloc-odr0-16384 229.00 ( 0.00%) 219.00 ( 4.37%)perf indicated that next_zones_zonelist disappeared in the profile and
__next_zones_zonelist did not appear. This is expected as the
micro-benchmark would hit the inlined fast-path every time.Signed-off-by: Mel Gorman
Cc: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
zone_statistics has one call-site but it's a public function. Make it
static and inline.The performance difference on a page allocator microbenchmark is;
4.6.0-rc2 4.6.0-rc2
statbranch-v1r20 statinline-v1r20
Min alloc-odr0-1 419.00 ( 0.00%) 412.00 ( 1.67%)
Min alloc-odr0-2 305.00 ( 0.00%) 301.00 ( 1.31%)
Min alloc-odr0-4 250.00 ( 0.00%) 247.00 ( 1.20%)
Min alloc-odr0-8 219.00 ( 0.00%) 215.00 ( 1.83%)
Min alloc-odr0-16 203.00 ( 0.00%) 199.00 ( 1.97%)
Min alloc-odr0-32 195.00 ( 0.00%) 191.00 ( 2.05%)
Min alloc-odr0-64 191.00 ( 0.00%) 187.00 ( 2.09%)
Min alloc-odr0-128 189.00 ( 0.00%) 185.00 ( 2.12%)
Min alloc-odr0-256 198.00 ( 0.00%) 193.00 ( 2.53%)
Min alloc-odr0-512 210.00 ( 0.00%) 207.00 ( 1.43%)
Min alloc-odr0-1024 216.00 ( 0.00%) 213.00 ( 1.39%)
Min alloc-odr0-2048 221.00 ( 0.00%) 220.00 ( 0.45%)
Min alloc-odr0-4096 227.00 ( 0.00%) 226.00 ( 0.44%)
Min alloc-odr0-8192 232.00 ( 0.00%) 229.00 ( 1.29%)
Min alloc-odr0-16384 232.00 ( 0.00%) 229.00 ( 1.29%)Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The PageAnon check always checks for compound_head but this is a
relatively expensive check if the caller already knows the page is a
head page. This patch creates a helper and uses it in the page free
path which only operates on head pages.With this patch and "Only check PageCompound for high-order pages", the
performance difference on a page allocator microbenchmark is;4.6.0-rc2 4.6.0-rc2
vanilla nocompound-v1r20
Min alloc-odr0-1 425.00 ( 0.00%) 417.00 ( 1.88%)
Min alloc-odr0-2 313.00 ( 0.00%) 308.00 ( 1.60%)
Min alloc-odr0-4 257.00 ( 0.00%) 253.00 ( 1.56%)
Min alloc-odr0-8 224.00 ( 0.00%) 221.00 ( 1.34%)
Min alloc-odr0-16 208.00 ( 0.00%) 205.00 ( 1.44%)
Min alloc-odr0-32 199.00 ( 0.00%) 199.00 ( 0.00%)
Min alloc-odr0-64 195.00 ( 0.00%) 193.00 ( 1.03%)
Min alloc-odr0-128 192.00 ( 0.00%) 191.00 ( 0.52%)
Min alloc-odr0-256 204.00 ( 0.00%) 200.00 ( 1.96%)
Min alloc-odr0-512 213.00 ( 0.00%) 212.00 ( 0.47%)
Min alloc-odr0-1024 219.00 ( 0.00%) 219.00 ( 0.00%)
Min alloc-odr0-2048 225.00 ( 0.00%) 225.00 ( 0.00%)
Min alloc-odr0-4096 230.00 ( 0.00%) 231.00 ( -0.43%)
Min alloc-odr0-8192 235.00 ( 0.00%) 234.00 ( 0.43%)
Min alloc-odr0-16384 235.00 ( 0.00%) 234.00 ( 0.43%)
Min free-odr0-1 215.00 ( 0.00%) 191.00 ( 11.16%)
Min free-odr0-2 152.00 ( 0.00%) 136.00 ( 10.53%)
Min free-odr0-4 119.00 ( 0.00%) 107.00 ( 10.08%)
Min free-odr0-8 106.00 ( 0.00%) 96.00 ( 9.43%)
Min free-odr0-16 97.00 ( 0.00%) 87.00 ( 10.31%)
Min free-odr0-32 91.00 ( 0.00%) 83.00 ( 8.79%)
Min free-odr0-64 89.00 ( 0.00%) 81.00 ( 8.99%)
Min free-odr0-128 88.00 ( 0.00%) 80.00 ( 9.09%)
Min free-odr0-256 106.00 ( 0.00%) 95.00 ( 10.38%)
Min free-odr0-512 116.00 ( 0.00%) 111.00 ( 4.31%)
Min free-odr0-1024 125.00 ( 0.00%) 118.00 ( 5.60%)
Min free-odr0-2048 133.00 ( 0.00%) 126.00 ( 5.26%)
Min free-odr0-4096 136.00 ( 0.00%) 130.00 ( 4.41%)
Min free-odr0-8192 138.00 ( 0.00%) 130.00 ( 5.80%)
Min free-odr0-16384 137.00 ( 0.00%) 130.00 ( 5.11%)There is a sizable boost to the free allocator performance. While there
is an apparent boost on the allocation side, it's likely a co-incidence
or due to the patches slightly reducing cache footprint.Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Another year, another round of page allocator optimisations focusing
this time on the alloc and free fast paths. This should be of help to
workloads that are allocator-intensive from kernel space where the cost
of zeroing is not nceessraily incurred.The series is motivated by the observation that page alloc
microbenchmarks on multiple machines regressed between 3.12.44 and 4.4.
Second, there is discussions before LSF/MM considering the possibility
of adding another page allocator which is potentially hazardous but a
patch series improving performance is better than whining.After the series is applied, there are still hazards. In the free
paths, the debugging checking and page zone/pageblock lookups dominate
but there was not an obvious solution to that. In the alloc path, the
major contributers are dealing with zonelists, new page preperation, the
fair zone allocation and numerous statistic updates. The fair zone
allocator is removed by the per-node LRU series if that gets merged so
it's nor a major concern at the moment.On normal userspace benchmarks, there is little impact as the zeroing
cost is significant but it's visibleaim9
4.6.0-rc3 4.6.0-rc3
vanilla deferalloc-v3
Min page_test 828693.33 ( 0.00%) 887060.00 ( 7.04%)
Min brk_test 4847266.67 ( 0.00%) 4966266.67 ( 2.45%)
Min exec_test 1271.00 ( 0.00%) 1275.67 ( 0.37%)
Min fork_test 12371.75 ( 0.00%) 12380.00 ( 0.07%)The overall impact on a page allocator microbenchmark for a range of orders
and number of pages allocated in a batch is4.6.0-rc3 4.6.0-rc3
vanilla deferalloc-v3r7
Min alloc-odr0-1 428.00 ( 0.00%) 316.00 ( 26.17%)
Min alloc-odr0-2 314.00 ( 0.00%) 231.00 ( 26.43%)
Min alloc-odr0-4 256.00 ( 0.00%) 192.00 ( 25.00%)
Min alloc-odr0-8 222.00 ( 0.00%) 166.00 ( 25.23%)
Min alloc-odr0-16 207.00 ( 0.00%) 154.00 ( 25.60%)
Min alloc-odr0-32 197.00 ( 0.00%) 148.00 ( 24.87%)
Min alloc-odr0-64 193.00 ( 0.00%) 144.00 ( 25.39%)
Min alloc-odr0-128 191.00 ( 0.00%) 143.00 ( 25.13%)
Min alloc-odr0-256 203.00 ( 0.00%) 153.00 ( 24.63%)
Min alloc-odr0-512 212.00 ( 0.00%) 165.00 ( 22.17%)
Min alloc-odr0-1024 221.00 ( 0.00%) 172.00 ( 22.17%)
Min alloc-odr0-2048 225.00 ( 0.00%) 179.00 ( 20.44%)
Min alloc-odr0-4096 232.00 ( 0.00%) 185.00 ( 20.26%)
Min alloc-odr0-8192 235.00 ( 0.00%) 187.00 ( 20.43%)
Min alloc-odr0-16384 236.00 ( 0.00%) 188.00 ( 20.34%)
Min alloc-odr1-1 519.00 ( 0.00%) 450.00 ( 13.29%)
Min alloc-odr1-2 391.00 ( 0.00%) 336.00 ( 14.07%)
Min alloc-odr1-4 313.00 ( 0.00%) 268.00 ( 14.38%)
Min alloc-odr1-8 277.00 ( 0.00%) 235.00 ( 15.16%)
Min alloc-odr1-16 256.00 ( 0.00%) 218.00 ( 14.84%)
Min alloc-odr1-32 252.00 ( 0.00%) 212.00 ( 15.87%)
Min alloc-odr1-64 244.00 ( 0.00%) 206.00 ( 15.57%)
Min alloc-odr1-128 244.00 ( 0.00%) 207.00 ( 15.16%)
Min alloc-odr1-256 243.00 ( 0.00%) 207.00 ( 14.81%)
Min alloc-odr1-512 245.00 ( 0.00%) 209.00 ( 14.69%)
Min alloc-odr1-1024 248.00 ( 0.00%) 214.00 ( 13.71%)
Min alloc-odr1-2048 253.00 ( 0.00%) 220.00 ( 13.04%)
Min alloc-odr1-4096 258.00 ( 0.00%) 224.00 ( 13.18%)
Min alloc-odr1-8192 261.00 ( 0.00%) 229.00 ( 12.26%)
Min alloc-odr2-1 560.00 ( 0.00%) 753.00 (-34.46%)
Min alloc-odr2-2 424.00 ( 0.00%) 351.00 ( 17.22%)
Min alloc-odr2-4 339.00 ( 0.00%) 393.00 (-15.93%)
Min alloc-odr2-8 298.00 ( 0.00%) 246.00 ( 17.45%)
Min alloc-odr2-16 276.00 ( 0.00%) 227.00 ( 17.75%)
Min alloc-odr2-32 271.00 ( 0.00%) 221.00 ( 18.45%)
Min alloc-odr2-64 264.00 ( 0.00%) 217.00 ( 17.80%)
Min alloc-odr2-128 264.00 ( 0.00%) 217.00 ( 17.80%)
Min alloc-odr2-256 264.00 ( 0.00%) 218.00 ( 17.42%)
Min alloc-odr2-512 269.00 ( 0.00%) 223.00 ( 17.10%)
Min alloc-odr2-1024 279.00 ( 0.00%) 230.00 ( 17.56%)
Min alloc-odr2-2048 283.00 ( 0.00%) 235.00 ( 16.96%)
Min alloc-odr2-4096 285.00 ( 0.00%) 239.00 ( 16.14%)
Min alloc-odr3-1 629.00 ( 0.00%) 505.00 ( 19.71%)
Min alloc-odr3-2 472.00 ( 0.00%) 374.00 ( 20.76%)
Min alloc-odr3-4 383.00 ( 0.00%) 301.00 ( 21.41%)
Min alloc-odr3-8 341.00 ( 0.00%) 266.00 ( 21.99%)
Min alloc-odr3-16 316.00 ( 0.00%) 248.00 ( 21.52%)
Min alloc-odr3-32 308.00 ( 0.00%) 241.00 ( 21.75%)
Min alloc-odr3-64 305.00 ( 0.00%) 241.00 ( 20.98%)
Min alloc-odr3-128 308.00 ( 0.00%) 244.00 ( 20.78%)
Min alloc-odr3-256 317.00 ( 0.00%) 249.00 ( 21.45%)
Min alloc-odr3-512 327.00 ( 0.00%) 256.00 ( 21.71%)
Min alloc-odr3-1024 331.00 ( 0.00%) 261.00 ( 21.15%)
Min alloc-odr3-2048 333.00 ( 0.00%) 266.00 ( 20.12%)
Min alloc-odr4-1 767.00 ( 0.00%) 572.00 ( 25.42%)
Min alloc-odr4-2 578.00 ( 0.00%) 429.00 ( 25.78%)
Min alloc-odr4-4 474.00 ( 0.00%) 346.00 ( 27.00%)
Min alloc-odr4-8 422.00 ( 0.00%) 310.00 ( 26.54%)
Min alloc-odr4-16 399.00 ( 0.00%) 295.00 ( 26.07%)
Min alloc-odr4-32 392.00 ( 0.00%) 293.00 ( 25.26%)
Min alloc-odr4-64 394.00 ( 0.00%) 293.00 ( 25.63%)
Min alloc-odr4-128 405.00 ( 0.00%) 305.00 ( 24.69%)
Min alloc-odr4-256 417.00 ( 0.00%) 319.00 ( 23.50%)
Min alloc-odr4-512 425.00 ( 0.00%) 326.00 ( 23.29%)
Min alloc-odr4-1024 426.00 ( 0.00%) 329.00 ( 22.77%)
Min free-odr0-1 216.00 ( 0.00%) 178.00 ( 17.59%)
Min free-odr0-2 152.00 ( 0.00%) 125.00 ( 17.76%)
Min free-odr0-4 120.00 ( 0.00%) 99.00 ( 17.50%)
Min free-odr0-8 106.00 ( 0.00%) 85.00 ( 19.81%)
Min free-odr0-16 97.00 ( 0.00%) 80.00 ( 17.53%)
Min free-odr0-32 92.00 ( 0.00%) 76.00 ( 17.39%)
Min free-odr0-64 89.00 ( 0.00%) 74.00 ( 16.85%)
Min free-odr0-128 89.00 ( 0.00%) 73.00 ( 17.98%)
Min free-odr0-256 107.00 ( 0.00%) 90.00 ( 15.89%)
Min free-odr0-512 117.00 ( 0.00%) 108.00 ( 7.69%)
Min free-odr0-1024 125.00 ( 0.00%) 118.00 ( 5.60%)
Min free-odr0-2048 132.00 ( 0.00%) 125.00 ( 5.30%)
Min free-odr0-4096 135.00 ( 0.00%) 130.00 ( 3.70%)
Min free-odr0-8192 137.00 ( 0.00%) 130.00 ( 5.11%)
Min free-odr0-16384 137.00 ( 0.00%) 131.00 ( 4.38%)
Min free-odr1-1 318.00 ( 0.00%) 289.00 ( 9.12%)
Min free-odr1-2 228.00 ( 0.00%) 207.00 ( 9.21%)
Min free-odr1-4 182.00 ( 0.00%) 165.00 ( 9.34%)
Min free-odr1-8 163.00 ( 0.00%) 146.00 ( 10.43%)
Min free-odr1-16 151.00 ( 0.00%) 135.00 ( 10.60%)
Min free-odr1-32 146.00 ( 0.00%) 129.00 ( 11.64%)
Min free-odr1-64 145.00 ( 0.00%) 130.00 ( 10.34%)
Min free-odr1-128 148.00 ( 0.00%) 134.00 ( 9.46%)
Min free-odr1-256 148.00 ( 0.00%) 137.00 ( 7.43%)
Min free-odr1-512 151.00 ( 0.00%) 140.00 ( 7.28%)
Min free-odr1-1024 154.00 ( 0.00%) 143.00 ( 7.14%)
Min free-odr1-2048 156.00 ( 0.00%) 144.00 ( 7.69%)
Min free-odr1-4096 156.00 ( 0.00%) 142.00 ( 8.97%)
Min free-odr1-8192 156.00 ( 0.00%) 140.00 ( 10.26%)
Min free-odr2-1 361.00 ( 0.00%) 457.00 (-26.59%)
Min free-odr2-2 258.00 ( 0.00%) 224.00 ( 13.18%)
Min free-odr2-4 208.00 ( 0.00%) 223.00 ( -7.21%)
Min free-odr2-8 185.00 ( 0.00%) 160.00 ( 13.51%)
Min free-odr2-16 173.00 ( 0.00%) 149.00 ( 13.87%)
Min free-odr2-32 166.00 ( 0.00%) 145.00 ( 12.65%)
Min free-odr2-64 166.00 ( 0.00%) 146.00 ( 12.05%)
Min free-odr2-128 169.00 ( 0.00%) 148.00 ( 12.43%)
Min free-odr2-256 170.00 ( 0.00%) 152.00 ( 10.59%)
Min free-odr2-512 177.00 ( 0.00%) 156.00 ( 11.86%)
Min free-odr2-1024 182.00 ( 0.00%) 162.00 ( 10.99%)
Min free-odr2-2048 181.00 ( 0.00%) 160.00 ( 11.60%)
Min free-odr2-4096 180.00 ( 0.00%) 159.00 ( 11.67%)
Min free-odr3-1 431.00 ( 0.00%) 367.00 ( 14.85%)
Min free-odr3-2 306.00 ( 0.00%) 259.00 ( 15.36%)
Min free-odr3-4 249.00 ( 0.00%) 208.00 ( 16.47%)
Min free-odr3-8 224.00 ( 0.00%) 186.00 ( 16.96%)
Min free-odr3-16 208.00 ( 0.00%) 176.00 ( 15.38%)
Min free-odr3-32 206.00 ( 0.00%) 174.00 ( 15.53%)
Min free-odr3-64 210.00 ( 0.00%) 178.00 ( 15.24%)
Min free-odr3-128 215.00 ( 0.00%) 182.00 ( 15.35%)
Min free-odr3-256 224.00 ( 0.00%) 189.00 ( 15.62%)
Min free-odr3-512 232.00 ( 0.00%) 195.00 ( 15.95%)
Min free-odr3-1024 230.00 ( 0.00%) 195.00 ( 15.22%)
Min free-odr3-2048 229.00 ( 0.00%) 193.00 ( 15.72%)
Min free-odr4-1 561.00 ( 0.00%) 439.00 ( 21.75%)
Min free-odr4-2 418.00 ( 0.00%) 318.00 ( 23.92%)
Min free-odr4-4 339.00 ( 0.00%) 269.00 ( 20.65%)
Min free-odr4-8 299.00 ( 0.00%) 239.00 ( 20.07%)
Min free-odr4-16 289.00 ( 0.00%) 234.00 ( 19.03%)
Min free-odr4-32 291.00 ( 0.00%) 235.00 ( 19.24%)
Min free-odr4-64 298.00 ( 0.00%) 238.00 ( 20.13%)
Min free-odr4-128 308.00 ( 0.00%) 251.00 ( 18.51%)
Min free-odr4-256 321.00 ( 0.00%) 267.00 ( 16.82%)
Min free-odr4-512 327.00 ( 0.00%) 269.00 ( 17.74%)
Min free-odr4-1024 326.00 ( 0.00%) 271.00 ( 16.87%)
Min total-odr0-1 644.00 ( 0.00%) 494.00 ( 23.29%)
Min total-odr0-2 466.00 ( 0.00%) 356.00 ( 23.61%)
Min total-odr0-4 376.00 ( 0.00%) 291.00 ( 22.61%)
Min total-odr0-8 328.00 ( 0.00%) 251.00 ( 23.48%)
Min total-odr0-16 304.00 ( 0.00%) 234.00 ( 23.03%)
Min total-odr0-32 289.00 ( 0.00%) 224.00 ( 22.49%)
Min total-odr0-64 282.00 ( 0.00%) 218.00 ( 22.70%)
Min total-odr0-128 280.00 ( 0.00%) 216.00 ( 22.86%)
Min total-odr0-256 310.00 ( 0.00%) 243.00 ( 21.61%)
Min total-odr0-512 329.00 ( 0.00%) 273.00 ( 17.02%)
Min total-odr0-1024 346.00 ( 0.00%) 290.00 ( 16.18%)
Min total-odr0-2048 357.00 ( 0.00%) 304.00 ( 14.85%)
Min total-odr0-4096 367.00 ( 0.00%) 315.00 ( 14.17%)
Min total-odr0-8192 372.00 ( 0.00%) 317.00 ( 14.78%)
Min total-odr0-16384 373.00 ( 0.00%) 319.00 ( 14.48%)
Min total-odr1-1 838.00 ( 0.00%) 739.00 ( 11.81%)
Min total-odr1-2 619.00 ( 0.00%) 543.00 ( 12.28%)
Min total-odr1-4 495.00 ( 0.00%) 433.00 ( 12.53%)
Min total-odr1-8 440.00 ( 0.00%) 382.00 ( 13.18%)
Min total-odr1-16 407.00 ( 0.00%) 353.00 ( 13.27%)
Min total-odr1-32 398.00 ( 0.00%) 341.00 ( 14.32%)
Min total-odr1-64 389.00 ( 0.00%) 336.00 ( 13.62%)
Min total-odr1-128 392.00 ( 0.00%) 341.00 ( 13.01%)
Min total-odr1-256 391.00 ( 0.00%) 344.00 ( 12.02%)
Min total-odr1-512 396.00 ( 0.00%) 349.00 ( 11.87%)
Min total-odr1-1024 402.00 ( 0.00%) 357.00 ( 11.19%)
Min total-odr1-2048 409.00 ( 0.00%) 364.00 ( 11.00%)
Min total-odr1-4096 414.00 ( 0.00%) 366.00 ( 11.59%)
Min total-odr1-8192 417.00 ( 0.00%) 369.00 ( 11.51%)
Min total-odr2-1 921.00 ( 0.00%) 1210.00 (-31.38%)
Min total-odr2-2 682.00 ( 0.00%) 576.00 ( 15.54%)
Min total-odr2-4 547.00 ( 0.00%) 616.00 (-12.61%)
Min total-odr2-8 483.00 ( 0.00%) 406.00 ( 15.94%)
Min total-odr2-16 449.00 ( 0.00%) 376.00 ( 16.26%)
Min total-odr2-32 437.00 ( 0.00%) 366.00 ( 16.25%)
Min total-odr2-64 431.00 ( 0.00%) 363.00 ( 15.78%)
Min total-odr2-128 433.00 ( 0.00%) 365.00 ( 15.70%)
Min total-odr2-256 434.00 ( 0.00%) 371.00 ( 14.52%)
Min total-odr2-512 446.00 ( 0.00%) 379.00 ( 15.02%)
Min total-odr2-1024 461.00 ( 0.00%) 392.00 ( 14.97%)
Min total-odr2-2048 464.00 ( 0.00%) 395.00 ( 14.87%)
Min total-odr2-4096 465.00 ( 0.00%) 398.00 ( 14.41%)
Min total-odr3-1 1060.00 ( 0.00%) 872.00 ( 17.74%)
Min total-odr3-2 778.00 ( 0.00%) 633.00 ( 18.64%)
Min total-odr3-4 632.00 ( 0.00%) 510.00 ( 19.30%)
Min total-odr3-8 565.00 ( 0.00%) 452.00 ( 20.00%)
Min total-odr3-16 524.00 ( 0.00%) 424.00 ( 19.08%)
Min total-odr3-32 514.00 ( 0.00%) 415.00 ( 19.26%)
Min total-odr3-64 515.00 ( 0.00%) 419.00 ( 18.64%)
Min total-odr3-128 523.00 ( 0.00%) 426.00 ( 18.55%)
Min total-odr3-256 541.00 ( 0.00%) 438.00 ( 19.04%)
Min total-odr3-512 559.00 ( 0.00%) 451.00 ( 19.32%)
Min total-odr3-1024 561.00 ( 0.00%) 456.00 ( 18.72%)
Min total-odr3-2048 562.00 ( 0.00%) 459.00 ( 18.33%)
Min total-odr4-1 1328.00 ( 0.00%) 1011.00 ( 23.87%)
Min total-odr4-2 997.00 ( 0.00%) 747.00 ( 25.08%)
Min total-odr4-4 813.00 ( 0.00%) 615.00 ( 24.35%)
Min total-odr4-8 721.00 ( 0.00%) 550.00 ( 23.72%)
Min total-odr4-16 689.00 ( 0.00%) 529.00 ( 23.22%)
Min total-odr4-32 683.00 ( 0.00%) 528.00 ( 22.69%)
Min total-odr4-64 692.00 ( 0.00%) 531.00 ( 23.27%)
Min total-odr4-128 713.00 ( 0.00%) 556.00 ( 22.02%)
Min total-odr4-256 738.00 ( 0.00%) 586.00 ( 20.60%)
Min total-odr4-512 753.00 ( 0.00%) 595.00 ( 20.98%)
Min total-odr4-1024 752.00 ( 0.00%) 600.00 ( 20.21%)This patch (of 27):
order-0 pages by definition cannot be compound so avoid the check in the
fast path for those pages.[akpm@linux-foundation.org: use unlikely(order) in free_pages_prepare(), per Vlastimil]
Signed-off-by: Mel Gorman
Acked-by: Vlastimil Babka
Cc: Jesper Dangaard Brouer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
__alloc_pages_may_oom is the central place to decide when the
out_of_memory should be invoked. This is a good approach for most
checks there because they are page allocator specific and the allocation
fails right after for all of them.The notable exception is GFP_NOFS context which is faking
did_some_progress and keep the page allocator looping even though there
couldn't have been any progress from the OOM killer. This patch doesn't
change this behavior because we are not ready to allow those allocation
requests to fail yet (and maybe we will face the reality that we will
never manage to safely fail these request). Instead __GFP_FS check is
moved down to out_of_memory and prevent from OOM victim selection there.
There are two reasons for that- OOM notifiers might release some memory even from this context
as none of the registered notifier seems to be FS related
- this might help a dying thread to get an access to memory
reserves and move on which will make the behavior more
consistent with the case when the task gets killed from a
different context.Keep a comment in __alloc_pages_may_oom to make sure we do not forget
how GFP_NOFS is special and that we really want to do something about
it.Note to the current oom_notifier users:
The observable difference for you is that oom notifiers cannot depend on
any fs locks because we could deadlock. Not that this would be allowed
today because that would just lockup machine in most of the cases and
ruling out the OOM killer along the way. Another difference is that
callbacks might be invoked sooner now because GFP_NOFS is a weaker
reclaim context and so there could be reclaimable memory which is just
not reachable now. That would require GFP_NOFS only loads which are
really rare and more importantly the observable result would be dropping
of reconstructible object and potential performance drop which is not
such a big deal when we are struggling to fulfill other important
allocation requests.Signed-off-by: Michal Hocko
Cc: Raushaniya Maksudova
Cc: Michael S. Tsirkin
Cc: Paul E. McKenney
Cc: David Rientjes
Cc: Tetsuo Handa
Cc: Daniel Vetter
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ZONE_MOVABLE could be treated as highmem so we need to consider it for
accurate statistics. And, in following patches, ZONE_CMA will be
introduced and it can be treated as highmem, too. So, instead of
manually adding stat of ZONE_MOVABLE, looping all zones and check
whether the zone is highmem or not and add stat of the zone which can be
treated as highmem.Signed-off-by: Joonsoo Kim
Reviewed-by: Aneesh Kumar K.V
Cc: Rik van Riel
Cc: Johannes Weiner
Cc: Mel Gorman
Cc: Laura Abbott
Cc: Minchan Kim
Cc: Marek Szyprowski
Cc: Michal Nazarewicz
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There is a system thats node's pfns are overlapped as follows:
-----pfn-------->
N0 N1 N2 N0 N1 N2Therefore, we need to care this overlapping when iterating pfn range.
mark_free_pages() iterates requested zone's pfn range and unset all
range's bitmap first. And then it marks freepages in a zone to the
bitmap. If there is an overlapping zone, above unset could clear
previous marked bit and reference to this bitmap in the future will
cause the problem. To prevent it, this patch adds a zone check in
mark_free_pages().Signed-off-by: Joonsoo Kim
Acked-by: Vlastimil Babka
Cc: Rik van Riel
Cc: Johannes Weiner
Cc: Mel Gorman
Cc: Laura Abbott
Cc: Minchan Kim
Cc: Marek Szyprowski
Cc: Michal Nazarewicz
Cc: "Aneesh Kumar K.V"
Cc: "Rafael J. Wysocki"
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
__offline_isolated_pages() and test_pages_isolated() are used by memory
hotplug. These functions require that range is in a single zone but
there is no code to do this because memory hotplug checks it before
calling these functions. To avoid confusing future user of these
functions, this patch adds comments to them.Signed-off-by: Joonsoo Kim
Acked-by: Vlastimil Babka
Cc: Rik van Riel
Cc: Johannes Weiner
Cc: Mel Gorman
Cc: Laura Abbott
Cc: Minchan Kim
Cc: Marek Szyprowski
Cc: Michal Nazarewicz
Cc: "Aneesh Kumar K.V"
Cc: "Rafael J. Wysocki"
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
__free_pages_boot_core has parameter pfn which is not used at all.
Remove it.Signed-off-by: Li Zhang
Reviewed-by: Pan Xinhui
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Many developers already know that field for reference count of the
struct page is _count and atomic type. They would try to handle it
directly and this could break the purpose of page reference count
tracepoint. To prevent direct _count modification, this patch rename it
to _refcount and add warning message on the code. After that, developer
who need to handle reference count will find that field should not be
accessed directly.[akpm@linux-foundation.org: fix comments, per Vlastimil]
[akpm@linux-foundation.org: Documentation/vm/transhuge.txt too]
[sfr@canb.auug.org.au: sync ethernet driver changes]
Signed-off-by: Joonsoo Kim
Signed-off-by: Stephen Rothwell
Cc: Vlastimil Babka
Cc: Hugh Dickins
Cc: Johannes Berg
Cc: "David S. Miller"
Cc: Sunil Goutham
Cc: Chris Metcalf
Cc: Manish Chopra
Cc: Yuval Mintz
Cc: Tariq Toukan
Cc: Saeed Mahameed
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds