28 Dec, 2018
1 commit
10 Apr, 2017
1 commit
26 Jul, 2016
1 commit
10 Dec, 2015
1 commit
18 Nov, 2015
1 commit
05 Jan, 2015
1 commit
24 Dec, 2014
1 commit
10 Oct, 2014
2 commits
09 Oct, 2014
1 commit
-
…nux-stable into ti-linux-3.12.y
This is the 3.12.30 stable release
* tag 'v3.12.30' of http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (97 commits)
Linux 3.12.30
mm: page_alloc: reduce cost of the fair zone allocation policy
mm: page_alloc: abort fair zone allocation policy when remotes nodes are encountered
mm: vmscan: only update per-cpu thresholds for online CPU
mm: move zone->pages_scanned into a vmstat counter
mm: rearrange zone fields into read-only, page alloc, statistics and page reclaim lines
mm: pagemap: avoid unnecessary overhead when tracepoints are deactivated
memcg, vmscan: Fix forced scan of anonymous pages
vmalloc: use rcu list iterator to reduce vmap_area_lock contention
mm: make copy_pte_range static again
mm, thp: only collapse hugepages to nodes with affinity for zone_reclaim_mode
mm/memory.c: use entry = ACCESS_ONCE(*pte) in handle_pte_fault()
shmem: fix init_page_accessed use to stop !PageLRU bug
mm: avoid unnecessary atomic operations during end_page_writeback()
mm: non-atomically mark page accessed during page cache allocation where possible
fs: buffer: do not use unnecessary atomic operations when discarding buffers
mm: do not use unnecessary atomic operations when adding pages to the LRU
mm: do not use atomic operations when releasing pages
mm: shmem: avoid atomic operation during shmem_getpage_gfp
mm: page_alloc: lookup pageblock migratetype with IRQs enabled during free
...Signed-off-by: Dan Murphy <DMurphy@ti.com>
01 Oct, 2014
1 commit
-
…nux-stable into ti-linux-3.12.y
This is the 3.12.29 stable release
* tag 'v3.12.29' of http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (143 commits)
Linux 3.12.29
arm64: flush TLS registers during exec
aio: add missing smp_rmb() in read_events_ring
ibmveth: Fix endian issues with rx_no_buffer statistic
ahci: add pcid for Marvel 0x9182 controller
ahci: Add Device IDs for Intel 9 Series PCH
pata_scc: propagate return value of scc_wait_after_reset
libata: widen Crucial M550 blacklist matching
drm/radeon/TN: only enable bapm on MSI systems
drm/radeon: enable bapm by default on desktop TN/RL boards
drm/i915: read HEAD register back in init_ring_common() to enforce ordering
drm/radeon: set VM base addr using the PFP v2
drm/radeon: load the lm63 driver for an lm64 thermal chip.
drm/ttm: Pass GFP flags in order to avoid deadlock.
drm/ttm: Fix possible stack overflow by recursive shrinker calls.
drm/ttm: Use mutex_trylock() to avoid deadlock inside shrinker functions.
drm/ttm: Choose a pool to shrink correctly in ttm_dma_pool_shrink_scan().
drm/ttm: Fix possible division by 0 in ttm_dma_pool_shrink_scan().
drm/tilcdc: fix double kfree
drm/tilcdc: fix release order on exit
...Conflicts:
drivers/mtd/nand/omap2.cSigned-off-by: Dan Murphy <DMurphy@ti.com>
26 Sep, 2014
29 commits
-
commit 4ffeaf3560a52b4a69cc7909873d08c0ef5909d4 upstream.
The fair zone allocation policy round-robins allocations between zones
within a node to avoid age inversion problems during reclaim. If the
first allocation fails, the batch counts are reset and a second attempt
made before entering the slow path.One assumption made with this scheme is that batches expire at roughly
the same time and the resets each time are justified. This assumption
does not hold when zones reach their low watermark as the batches will
be consumed at uneven rates. Allocation failure due to watermark
depletion result in additional zonelist scans for the reset and another
watermark check before hitting the slowpath.On UMA, the benefit is negligible -- around 0.25%. On 4-socket NUMA
machine it's variable due to the variability of measuring overhead with
the vmstat changes. The system CPU overhead comparison looks like3.16.0-rc3 3.16.0-rc3 3.16.0-rc3
vanilla vmstat-v5 lowercost-v5
User 746.94 774.56 802.00
System 65336.22 32847.27 40852.33
Elapsed 27553.52 27415.04 27368.46However it is worth noting that the overall benchmark still completed
faster and intuitively it makes sense to take as few passes as possible
through the zonelists.Signed-off-by: Mel Gorman
Acked-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit f7b5d647946aae1647bf5cd26c16b3a793c1ac49 upstream.
The purpose of numa_zonelist_order=zone is to preserve lower zones for
use with 32-bit devices. If locality is preferred then the
numa_zonelist_order=node policy should be used.Unfortunately, the fair zone allocation policy overrides this by
skipping zones on remote nodes until the lower one is found. While this
makes sense from a page aging and performance perspective, it breaks the
expected zonelist policy. This patch restores the expected behaviour
for zone-list ordering.Signed-off-by: Mel Gorman
Acked-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit bb0b6dffa2ccfbd9747ad0cc87c7459622896e60 upstream.
When kswapd is awake reclaiming, the per-cpu stat thresholds are lowered
to get more accurate counts to avoid breaching watermarks. This
threshold update iterates over all possible CPUs which is unnecessary.
Only online CPUs need to be updated. If a new CPU is onlined,
refresh_zone_stat_thresholds() will set the thresholds correctly.Signed-off-by: Mel Gorman
Acked-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit 0d5d823ab4e608ec7b52ac4410de4cb74bbe0edd upstream.
zone->pages_scanned is a write-intensive cache line during page reclaim
and it's also updated during page free. Move the counter into vmstat to
take advantage of the per-cpu updates and do not update it in the free
paths unless necessary.On a small UMA machine running tiobench the difference is marginal. On
a 4-node machine the overhead is more noticable. Note that automatic
NUMA balancing was disabled for this test as otherwise the system CPU
overhead is unpredictable.3.16.0-rc3 3.16.0-rc3 3.16.0-rc3
vanillarearrange-v5 vmstat-v5
User 746.94 759.78 774.56
System 65336.22 58350.98 32847.27
Elapsed 27553.52 27282.02 27415.04Note that the overhead reduction will vary depending on where exactly
pages are allocated and freed.Signed-off-by: Mel Gorman
Acked-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit 3484b2de9499df23c4604a513b36f96326ae81ad upstream.
The arrangement of struct zone has changed over time and now it has
reached the point where there is some inappropriate sharing going on.
On x86-64 for exampleo The zone->node field is shared with the zone lock and zone->node is
accessed frequently from the page allocator due to the fair zone
allocation policy.o span_seqlock is almost never used by shares a line with free_area
o Some zone statistics share a cache line with the LRU lock so
reclaim-intensive and allocator-intensive workloads can bounce the cache
line on a stat updateThis patch rearranges struct zone to put read-only and read-mostly
fields together and then splits the page allocator intensive fields, the
zone statistics and the page reclaim intensive fields into their own
cache lines. Note that the type of lowmem_reserve changes due to the
watermark calculations being signed and avoiding a signed/unsigned
conversion there.On the test configuration I used the overall size of struct zone shrunk
by one cache line. On smaller machines, this is not likely to be
noticable. However, on a 4-node NUMA machine running tiobench the
system CPU overhead is reduced by this patch.3.16.0-rc3 3.16.0-rc3
vanillarearrange-v5r9
User 746.94 759.78
System 65336.22 58350.98
Elapsed 27553.52 27282.02Signed-off-by: Mel Gorman
Acked-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit 24b7e5819ad5cbef2b7c7376510862aa8319d240 upstream.
This was formerly the series "Improve sequential read throughput" which
noted some major differences in performance of tiobench since 3.0.
While there are a number of factors, two that dominated were the
introduction of the fair zone allocation policy and changes to CFQ.The behaviour of fair zone allocation policy makes more sense than
tiobench as a benchmark and CFQ defaults were not changed due to
insufficient benchmarking.This series is what's left. It's one functional fix to the fair zone
allocation policy when used on NUMA machines and a reduction of overhead
in general. tiobench was used for the comparison despite its flaws as
an IO benchmark as in this case we are primarily interested in the
overhead of page allocator and page reclaim activity.On UMA, it makes little difference to overhead
3.16.0-rc3 3.16.0-rc3
vanilla lowercost-v5
User 383.61 386.77
System 403.83 401.74
Elapsed 5411.50 5413.11On a 4-socket NUMA machine it's a bit more noticable
3.16.0-rc3 3.16.0-rc3
vanilla lowercost-v5
User 746.94 802.00
System 65336.22 40852.33
Elapsed 27553.52 27368.46This patch (of 6):
The LRU insertion and activate tracepoints take PFN as a parameter
forcing the overhead to the caller. Move the overhead to the tracepoint
fast-assign method to ensure the cost is only incurred when the
tracepoint is active.Signed-off-by: Mel Gorman
Acked-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit 2ab051e11bfa3cbb7b24177f3d6aaed10a0d743e upstream.
When memory cgoups are enabled, the code that decides to force to scan
anonymous pages in get_scan_count() compares global values (free,
high_watermark) to a value that is restricted to a memory cgroup (file).
It make the code over-eager to force anon scan.For instance, it will force anon scan when scanning a memcg that is
mainly populated by anonymous page, even when there is plenty of file
pages to get rid of in others memcgs, even when swappiness == 0. It
breaks user's expectation about swappiness and hurts performance.This patch makes sure that forced anon scan only happens when there not
enough file pages for the all zone, not just in one random memcg.[hannes@cmpxchg.org: cleanups]
Signed-off-by: Jerome Marchand
Acked-by: Michal Hocko
Acked-by: Johannes Weiner
Reviewed-by: Rik van Riel
Cc: Mel Gorman
Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit 474750aba88817c53f39424e5567b8e4acc4b39b upstream.
Richard Yao reported a month ago that his system have a trouble with
vmap_area_lock contention during performance analysis by /proc/meminfo.
Andrew asked why his analysis checks /proc/meminfo stressfully, but he
didn't answer it.https://lkml.org/lkml/2014/4/10/416
Although I'm not sure that this is right usage or not, there is a
solution reducing vmap_area_lock contention with no side-effect. That
is just to use rcu list iterator in get_vmalloc_info().rcu can be used in this function because all RCU protocol is already
respected by writers, since Nick Piggin commit db64fe02258f1 ("mm:
rewrite vmap layer") back in linux-2.6.28Specifically :
insertions use list_add_rcu(),
deletions use list_del_rcu() and kfree_rcu().Note the rb tree is not used from rcu reader (it would not be safe),
only the vmap_area_list has full RCU protection.Note that __purge_vmap_area_lazy() already uses this rcu protection.
rcu_read_lock();
list_for_each_entry_rcu(va, &vmap_area_list, list) {
if (va->flags & VM_LAZY_FREE) {
if (va->va_start < *start)
*start = va->va_start;
if (va->va_end > *end)
*end = va->va_end;
nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
list_add_tail(&va->purge_list, &valist);
va->flags |= VM_LAZY_FREEING;
va->flags &= ~VM_LAZY_FREE;
}
}
rcu_read_unlock();Peter:
: While rcu list traversal over the vmap_area_list is safe, this may
: arrive at different results than the spinlocked version. The rcu list
: traversal version will not be a 'snapshot' of a single, valid instant
: of the entire vmap_area_list, but rather a potential amalgam of
: different list states.Joonsoo:
: Yes, you are right, but I don't think that we should be strict here.
: Meminfo is already not a 'snapshot' at specific time. While we try to get
: certain stats, the other stats can change. And, although we may arrive at
: different results than the spinlocked version, the difference would not be
: large and would not make serious side-effect.[edumazet@google.com: add more commit description]
Signed-off-by: Joonsoo Kim
Reported-by: Richard Yao
Acked-by: Eric Dumazet
Cc: Peter Hurley
Cc: Zhang Yanfei
Cc: Johannes Weiner
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit 21bda264f4243f61dfcc485174055f12ad0530b4 upstream.
Commit 71e3aac0724f ("thp: transparent hugepage core") adds
copy_pte_range prototype to huge_mm.h. I'm not sure why (or if) this
function have been used outside of memory.c, but it currently isn't.
This patch makes copy_pte_range() static again.Signed-off-by: Jerome Marchand
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit 14a4e2141e24304fff2c697be6382ffb83888185 upstream.
Commit 9f1b868a13ac ("mm: thp: khugepaged: add policy for finding target
node") improved the previous khugepaged logic which allocated a
transparent hugepages from the node of the first page being collapsed.However, it is still possible to collapse pages to remote memory which
may suffer from additional access latency. With the current policy, it
is possible that 255 pages (with PAGE_SHIFT == 12) will be collapsed
remotely if the majority are allocated from that node.When zone_reclaim_mode is enabled, it means the VM should make every
attempt to allocate locally to prevent NUMA performance degradation. In
this case, we do not want to collapse hugepages to remote nodes that
would suffer from increased access latency. Thus, when
zone_reclaim_mode is enabled, only allow collapsing to nodes with
RECLAIM_DISTANCE or less.There is no functional change for systems that disable
zone_reclaim_mode.Signed-off-by: David Rientjes
Cc: Dave Hansen
Cc: Andrea Arcangeli
Acked-by: Vlastimil Babka
Acked-by: Mel Gorman
Cc: Rik van Riel
Cc: "Kirill A. Shutemov"
Cc: Bob Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit c0d73261f5c1355a35b8b40e871d31578ce0c044 upstream.
Use ACCESS_ONCE() in handle_pte_fault() when getting the entry or
orig_pte upon which all subsequent decisions and pte_same() tests will
be made.I have no evidence that its lack is responsible for the mm/filemap.c:202
BUG_ON(page_mapped(page)) in __delete_from_page_cache() found by
trinity, and I am not optimistic that it will fix it. But I have found
no other explanation, and ACCESS_ONCE() here will surely not hurt.If gcc does re-access the pte before passing it down, then that would be
disastrous for correct page fault handling, and certainly could explain
the page_mapped() BUGs seen (concurrent fault causing page to be mapped
in a second time on top of itself: mapcount 2 for a single pte).Signed-off-by: Hugh Dickins
Cc: Sasha Levin
Cc: Linus Torvalds
Cc: "Kirill A. Shutemov"
Cc: Konstantin Khlebnikov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit 66d2f4d28cd030220e7ea2a628993fcabcb956d1 upstream.
Under shmem swapping load, I sometimes hit the VM_BUG_ON_PAGE(!PageLRU)
in isolate_lru_pages() at mm/vmscan.c:1281!Commit 2457aec63745 ("mm: non-atomically mark page accessed during page
cache allocation where possible") looks like interrupted work-in-progress.mm/filemap.c's call to init_page_accessed() is fine, but not mm/shmem.c's
- shmem_write_begin() is clearly wrong to use it after shmem_getpage(),
when the page is always visible in radix_tree, and often already on LRU.Revert change to shmem_write_begin(), and use init_page_accessed() or
mark_page_accessed() appropriately for SGP_WRITE in shmem_getpage_gfp().SGP_WRITE also covers shmem_symlink(), which did not mark_page_accessed()
before; but since many other filesystems use [__]page_symlink(), which did
and does mark the page accessed, consider this as rectifying an oversight.Signed-off-by: Hugh Dickins
Acked-by: Mel Gorman
Cc: Johannes Weiner
Cc: Vlastimil Babka
Cc: Michal Hocko
Cc: Dave Hansen
Cc: Prabhakar Lad
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit 888cf2db475a256fb0cda042140f73d7881f81fe upstream.
If a page is marked for immediate reclaim then it is moved to the tail of
the LRU list. This occurs when the system is under enough memory pressure
for pages under writeback to reach the end of the LRU but we test for this
using atomic operations on every writeback. This patch uses an optimistic
non-atomic test first. It'll miss some pages in rare cases but the
consequences are not severe enough to warrant such a penalty.While the function does not dominate profiles during a simple dd test the
cost of it is reduced.73048 0.7428 vmlinux-3.15.0-rc5-mmotm-20140513 end_page_writeback
23740 0.2409 vmlinux-3.15.0-rc5-lessatomic end_page_writebackSigned-off-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit 2457aec63745e235bcafb7ef312b182d8682f0fc upstream.
aops->write_begin may allocate a new page and make it visible only to have
mark_page_accessed called almost immediately after. Once the page is
visible the atomic operations are necessary which is noticable overhead
when writing to an in-memory filesystem like tmpfs but should also be
noticable with fast storage. The objective of the patch is to initialse
the accessed information with non-atomic operations before the page is
visible.The bulk of filesystems directly or indirectly use
grab_cache_page_write_begin or find_or_create_page for the initial
allocation of a page cache page. This patch adds an init_page_accessed()
helper which behaves like the first call to mark_page_accessed() but may
called before the page is visible and can be done non-atomically.The primary APIs of concern in this care are the following and are used
by most filesystems.find_get_page
find_lock_page
find_or_create_page
grab_cache_page_nowait
grab_cache_page_write_beginAll of them are very similar in detail to the patch creates a core helper
pagecache_get_page() which takes a flags parameter that affects its
behavior such as whether the page should be marked accessed or not. Then
old API is preserved but is basically a thin wrapper around this core
function.Each of the filesystems are then updated to avoid calling
mark_page_accessed when it is known that the VM interfaces have already
done the job. There is a slight snag in that the timing of the
mark_page_accessed() has now changed so in rare cases it's possible a page
gets to the end of the LRU as PageReferenced where as previously it might
have been repromoted. This is expected to be rare but it's worth the
filesystem people thinking about it in case they see a problem with the
timing change. It is also the case that some filesystems may be marking
pages accessed that previously did not but it makes sense that filesystems
have consistent behaviour in this regard.The test case used to evaulate this is a simple dd of a large file done
multiple times with the file deleted on each iterations. The size of the
file is 1/10th physical memory to avoid dirty page balancing. In the
async case it will be possible that the workload completes without even
hitting the disk and will have variable results but highlight the impact
of mark_page_accessed for async IO. The sync results are expected to be
more stable. The exception is tmpfs where the normal case is for the "IO"
to not hit the disk.The test machine was single socket and UMA to avoid any scheduling or NUMA
artifacts. Throughput and wall times are presented for sync IO, only wall
times are shown for async as the granularity reported by dd and the
variability is unsuitable for comparison. As async results were variable
do to writback timings, I'm only reporting the maximum figures. The sync
results were stable enough to make the mean and stddev uninteresting.The performance results are reported based on a run with no profiling.
Profile data is based on a separate run with oprofile running.async dd
3.15.0-rc3 3.15.0-rc3
vanilla accessed-v2
ext3 Max elapsed 13.9900 ( 0.00%) 11.5900 ( 17.16%)
tmpfs Max elapsed 0.5100 ( 0.00%) 0.4900 ( 3.92%)
btrfs Max elapsed 12.8100 ( 0.00%) 12.7800 ( 0.23%)
ext4 Max elapsed 18.6000 ( 0.00%) 13.3400 ( 28.28%)
xfs Max elapsed 12.5600 ( 0.00%) 2.0900 ( 83.36%)The XFS figure is a bit strange as it managed to avoid a worst case by
sheer luck but the average figures looked reasonable.samples percentage
ext3 86107 0.9783 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
ext3 23833 0.2710 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext3 5036 0.0573 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
ext4 64566 0.8961 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
ext4 5322 0.0713 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
ext4 2869 0.0384 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs 62126 1.7675 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
xfs 1904 0.0554 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
xfs 103 0.0030 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
btrfs 10655 0.1338 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
btrfs 2020 0.0273 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
btrfs 587 0.0079 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed
tmpfs 59562 3.2628 vmlinux-3.15.0-rc4-vanilla mark_page_accessed
tmpfs 1210 0.0696 vmlinux-3.15.0-rc4-accessed-v3r25 init_page_accessed
tmpfs 94 0.0054 vmlinux-3.15.0-rc4-accessed-v3r25 mark_page_accessed[akpm@linux-foundation.org: don't run init_page_accessed() against an uninitialised pointer]
Signed-off-by: Mel Gorman
Cc: Johannes Weiner
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Theodore Ts'o
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Rik van Riel
Cc: Peter Zijlstra
Tested-by: Prabhakar Lad
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit e7470ee89f003634a88e7b5e5a7b65b3025987de upstream.
Discarding buffers uses a bunch of atomic operations when discarding
buffers because ...... I can't think of a reason. Use a cmpxchg loop to
clear all the necessary flags. In most (all?) cases this will be a single
atomic operations.[akpm@linux-foundation.org: move BUFFER_FLAGS_DISCARD into the .c file]
Signed-off-by: Mel Gorman
Cc: Johannes Weiner
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Theodore Ts'o
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Rik van Riel
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit 6fb81a17d21f2a138b8f424af4cf379f2b694060 upstream.
When adding pages to the LRU we clear the active bit unconditionally.
As the page could be reachable from other paths we cannot use unlocked
operations without risk of corruption such as a parallel
mark_page_accessed. This patch tests if is necessary to clear the
active flag before using an atomic operation. This potentially opens a
tiny race when PageActive is checked as mark_page_accessed could be
called after PageActive was checked. The race already exists but this
patch changes it slightly. The consequence is that that the page may be
promoted to the active list that might have been left on the inactive
list before the patch. It's too tiny a race and too marginal a
consequence to always use atomic operations for.Signed-off-by: Mel Gorman
Acked-by: Johannes Weiner
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Theodore Ts'o
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Rik van Riel
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit e3741b506c5088fa8c911bb5884c430f770fb49d upstream.
There should be no references to it any more and a parallel mark should
not be reordered against us. Use non-locked varient to clear page active.Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
Cc: Johannes Weiner
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Theodore Ts'o
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit 07a427884348d38a6fd56fa4d78249c407196650 upstream.
shmem_getpage_gfp uses an atomic operation to set the SwapBacked field
before it's even added to the LRU or visible. This is unnecessary as what
could it possible race against? Use an unlocked variant.Signed-off-by: Mel Gorman
Acked-by: Johannes Weiner
Acked-by: Rik van Riel
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Theodore Ts'o
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit cfc47a2803db42140167b92d991ef04018e162c7 upstream.
get_pageblock_migratetype() is called during free with IRQs disabled.
This is unnecessary and disables IRQs for longer than necessary.Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
Cc: Johannes Weiner
Acked-by: Vlastimil Babka
Cc: Jan Kara
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Theodore Ts'o
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit b745bc85f21ea707e4ea1a91948055fa3e72c77b upstream.
cold is a bool, make it one. Make the likely case the "if" part of the
block instead of the else as according to the optimisation manual this is
preferred.Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
Cc: Johannes Weiner
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Theodore Ts'o
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit dc4b0caff24d9b2918e9f27bc65499ee63187eba upstream.
In the free path we calculate page_to_pfn multiple times. Reduce that.
Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
Cc: Johannes Weiner
Acked-by: Vlastimil Babka
Cc: Jan Kara
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Theodore Ts'o
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit 7aeb09f9104b760fc53c98cb7d20d06640baf9e6 upstream.
X86 prefers the use of unsigned types for iterators and there is a
tendency to mix whether a signed or unsigned type if used for page order.
This converts a number of sites in mm/page_alloc.c to use unsigned int for
order where possible.Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
Cc: Johannes Weiner
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Theodore Ts'o
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit 5dab29113ca56335c78be3f98bf5ddf2ef8eb6a6 upstream.
ALLOC_NO_WATERMARK is set in a few cases. Always by kswapd, always for
__GFP_MEMALLOC, sometimes for swap-over-nfs, tasks etc. Each of these
cases are relatively rare events but the ALLOC_NO_WATERMARK check is an
unlikely branch in the fast path. This patch moves the check out of the
fast path and after it has been determined that the watermarks have not
been met. This helps the common fast path at the cost of making the slow
path slower and hitting kswapd with a performance cost. It's a reasonable
tradeoff.Signed-off-by: Mel Gorman
Acked-by: Johannes Weiner
Reviewed-by: Rik van Riel
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Theodore Ts'o
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit a6e21b14f22041382e832d30deda6f26f37b1097 upstream.
Currently it's calculated once per zone in the zonelist.
Signed-off-by: Mel Gorman
Acked-by: Johannes Weiner
Reviewed-by: Rik van Riel
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Theodore Ts'o
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit d34c5fa06fade08a689fc171bf756fba2858ae73 upstream.
A node/zone index is used to check if pages are compatible for merging
but this happens unconditionally even if the buddy page is not free. Defer
the calculation as long as possible. Ideally we would check the zone boundary
but nodes can overlap.Signed-off-by: Mel Gorman
Acked-by: Johannes Weiner
Acked-by: Rik van Riel
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Theodore Ts'o
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit d8846374a85f4290a473a4e2a64c1ba046c4a0e1 upstream.
There is no need to calculate zone_idx(preferred_zone) multiple times
or use the pgdat to figure it out.Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
Acked-by: David Rientjes
Cc: Johannes Weiner
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Theodore Ts'o
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Dan Carpenter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit 664eeddeef6539247691197c1ac124d4aa872ab6 upstream.
If cpusets are not in use then we still check a global variable on every
page allocation. Use jump labels to avoid the overhead.Signed-off-by: Mel Gorman
Reviewed-by: Rik van Riel
Cc: Johannes Weiner
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Theodore Ts'o
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby -
commit ea5e9539abf1258f23e725cb9cb25aa74efa29eb upstream.
This patch exposes the jump_label reference count in preparation for the
next patch. cpusets cares about both the jump_label being enabled and how
many users of the cpusets there currently are.Signed-off-by: Peter Zijlstra
Signed-off-by: Mel Gorman
Cc: Johannes Weiner
Cc: Vlastimil Babka
Cc: Jan Kara
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Dave Hansen
Cc: Theodore Ts'o
Cc: "Paul E. McKenney"
Cc: Oleg Nesterov
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby