30 Dec, 2020
2 commits
-
commit dcf5aedb24f899d537e21c18ea552c780598d352 upstream.
Use temporary slots in reclaim function to avoid possible race when
freeing those.While at it, make sure we check CLAIMED flag under page lock in the
reclaim function to make sure we are not racing with z3fold_alloc().Link: https://lkml.kernel.org/r/20201209145151.18994-4-vitaly.wool@konsulko.com
Signed-off-by: Vitaly Wool
Cc:
Cc: Mike Galbraith
Cc: Sebastian Andrzej Siewior
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman -
commit fc5488651c7d840c9cad9b0f273f2f31bd03413a upstream.
Patch series "z3fold: stability / rt fixes".
Address z3fold stability issues under stress load, primarily in the
reclaim and free aspects. Besides, it fixes the locking problems that
were only seen in real-time kernel configuration.This patch (of 3):
There used to be two places in the code where slots could be freed, namely
when freeing the last allocated handle from the slots and when releasing
the z3fold header these slots aree linked to. The logic to decide on
whether to free certain slots was complicated and error prone in both
functions and it led to failures in RT case.To fix that, make free_handle() the single point of freeing slots.
Link: https://lkml.kernel.org/r/20201209145151.18994-1-vitaly.wool@konsulko.com
Link: https://lkml.kernel.org/r/20201209145151.18994-2-vitaly.wool@konsulko.com
Signed-off-by: Vitaly Wool
Tested-by: Mike Galbraith
Cc: Sebastian Andrzej Siewior
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman
14 Oct, 2020
1 commit
-
alloc_slots() allocates memory for slots using kmem_cache_alloc(), then
memsets it. We can just use kmem_cache_zalloc().Signed-off-by: Hui Su
Signed-off-by: Andrew Morton
Reviewed-by: Andrew Morton
Link: https://lkml.kernel.org/r/20200926100834.GA184671@rlk
Signed-off-by: Linus Torvalds
29 May, 2020
1 commit
-
Kmemleak reported many leaks while under memory pressue in,
slots = alloc_slots(pool, gfp);
which is referenced by "zhdr" in init_z3fold_page(),
zhdr->slots = slots;
However, "zhdr" could be gone without freeing slots as the later will be
freed separately when the last "handle" off of "handles" array is freed.
It will be within "slots" which is always aligned.unreferenced object 0xc000000fdadc1040 (size 104):
comm "oom04", pid 140476, jiffies 4295359280 (age 3454.970s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
z3fold_zpool_malloc+0x7b0/0xe10
alloc_slots at mm/z3fold.c:214
(inlined by) init_z3fold_page at mm/z3fold.c:412
(inlined by) z3fold_alloc at mm/z3fold.c:1161
(inlined by) z3fold_zpool_malloc at mm/z3fold.c:1735
zpool_malloc+0x34/0x50
zswap_frontswap_store+0x60c/0xda0
zswap_frontswap_store at mm/zswap.c:1093
__frontswap_store+0x128/0x330
swap_writepage+0x58/0x110
pageout+0x16c/0xa40
shrink_page_list+0x1ac8/0x25c0
shrink_inactive_list+0x270/0x730
shrink_lruvec+0x444/0xf30
shrink_node+0x2a4/0x9c0
do_try_to_free_pages+0x158/0x640
try_to_free_pages+0x1bc/0x5f0
__alloc_pages_slowpath.constprop.60+0x4dc/0x15a0
__alloc_pages_nodemask+0x520/0x650
alloc_pages_vma+0xc0/0x420
handle_mm_fault+0x1174/0x1bf0Signed-off-by: Qian Cai
Signed-off-by: Andrew Morton
Acked-by: Vitaly Wool
Acked-by: Catalin Marinas
Link: http://lkml.kernel.org/r/20200522220052.2225-1-cai@lca.pw
Signed-off-by: Linus Torvalds
24 May, 2020
1 commit
-
free_handle() for a foreign handle may race with inter-page compaction,
what can lead to memory corruption.To avoid that, take write lock not read lock in free_handle to be
synchronized with __release_z3fold_page().For example KASAN can detect it:
==================================================================
BUG: KASAN: use-after-free in LZ4_decompress_safe+0x2c4/0x3b8
Read of size 1 at addr ffffffc976695ca3 by task GoogleApiHandle/4121CPU: 0 PID: 4121 Comm: GoogleApiHandle Tainted: P S OE 4.19.81-perf+ #162
Hardware name: Sony Mobile Communications. PDX-203(KONA) (DT)
Call trace:
LZ4_decompress_safe+0x2c4/0x3b8
lz4_decompress_crypto+0x3c/0x70
crypto_decompress+0x58/0x70
zcomp_decompress+0xd4/0x120
...Apart from that, initialize zhdr->mapped_count in init_z3fold_page() and
remove "newpage" variable because it is not used anywhere.Signed-off-by: Uladzislau Rezki
Signed-off-by: Vitaly Wool
Signed-off-by: Andrew Morton
Cc: Qian Cai
Cc: Raymond Jennings
Cc:
Link: http://lkml.kernel.org/r/20200520082100.28876-1-vitaly.wool@konsulko.com
Signed-off-by: Linus Torvalds
06 Mar, 2020
1 commit
-
rwlock.h should not be included directly. Instead linux/splinlock.h
should be included. One thing it does is to break the RT build.Signed-off-by: Andrew Morton
Signed-off-by: Sebastian Andrzej Siewior
Cc: Peter Zijlstra
Cc: Vitaly Wool
Cc: Thomas Gleixner
Link: http://lkml.kernel.org/r/20200224133631.1510569-1-bigeasy@linutronix.de
Signed-off-by: Linus Torvalds
02 Dec, 2019
1 commit
-
For each page scheduled for compaction (e. g. by z3fold_free()), try to
apply inter-page compaction before running the traditional/ existing
intra-page compaction. That means, if the page has only one buddy, we
treat that buddy as a new object that we aim to place into an existing
z3fold page. If such a page is found, that object is transferred and the
old page is freed completely. The transferred object is named "foreign"
and treated slightly differently thereafter.Namely, we increase "foreign handle" counter for the new page. Pages with
non-zero "foreign handle" count become unmovable. This patch implements
"foreign handle" detection when a handle is freed to decrement the foreign
handle counter accordingly, so a page may as well become movable again as
the time goes by.As a result, we almost always have exactly 3 objects per page and
significantly better average compression ratio.[cai@lca.pw: fix -Wunused-but-set-variable warnings]
Link: http://lkml.kernel.org/r/1570542062-29144-1-git-send-email-cai@lca.pw
[vitalywool@gmail.com: avoid subtle race when freeing slots]
Link: http://lkml.kernel.org/r/20191127152118.6314b99074b0626d4c5a8835@gmail.com
[vitalywool@gmail.com: compact objects more accurately]
Link: http://lkml.kernel.org/r/20191127152216.6ad33745a21ba71c53606acb@gmail.com
[vitalywool@gmail.com: protect handle reads]
Link: http://lkml.kernel.org/r/20191127152345.8059852f60947686674d726d@gmail.com
Link: http://lkml.kernel.org/r/20191006041457.24113-1-vitalywool@gmail.com
Signed-off-by: Vitaly Wool
Cc: Dan Streetman
Cc: Henry Burns
Cc: Shakeel Butt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Oct, 2019
1 commit
-
There's a really hard to reproduce race in z3fold between z3fold_free()
and z3fold_reclaim_page(). z3fold_reclaim_page() can claim the page
after z3fold_free() has checked if the page was claimed and
z3fold_free() will then schedule this page for compaction which may in
turn lead to random page faults (since that page would have been
reclaimed by then).Fix that by claiming page in the beginning of z3fold_free() and not
forgetting to clear the claim in the end.[vitalywool@gmail.com: v2]
Link: http://lkml.kernel.org/r/20190928113456.152742cf@bigdell
Link: http://lkml.kernel.org/r/20190926104844.4f0c6efa1366b8f5741eaba9@gmail.com
Signed-off-by: Vitaly Wool
Reported-by: Markus Linnala
Cc: Dan Streetman
Cc: Vlastimil Babka
Cc: Henry Burns
Cc: Shakeel Butt
Cc: Markus Linnala
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Sep, 2019
3 commits
-
Currently there is a leak in init_z3fold_page() -- it allocates handles
from kmem cache even for headless pages, but then they are never used and
never freed, so eventually kmem cache may get exhausted. This patch
provides a fix for that.Link: http://lkml.kernel.org/r/20190917185352.44cf285d3ebd9e64548de5de@gmail.com
Signed-off-by: Vitaly Wool
Reported-by: Markus Linnala
Tested-by: Markus Linnala
Cc: Dan Streetman
Cc: Henry Burns
Cc: Shakeel Butt
Cc: Vlastimil Babka
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
z3fold_page_reclaim()'s retry mechanism is broken: on a second iteration
it will have zhdr from the first one so that zhdr is no longer in line
with struct page. That leads to crashes when the system is stressed.Fix that by moving zhdr assignment up.
While at it, protect against using already freed handles by using own
local slots structure in z3fold_page_reclaim().Link: http://lkml.kernel.org/r/20190908162919.830388dc7404d1e2c80f4095@gmail.com
Signed-off-by: Vitaly Wool
Reported-by: Markus Linnala
Reported-by: Chris Murphy
Reported-by: Agustin Dall'Alba
Cc: "Maciej S. Szmigiero"
Cc: Shakeel Butt
Cc: Henry Burns
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With the original commit applied, z3fold_zpool_destroy() may get blocked
on wait_event() for indefinite time. Revert this commit for the time
being to get rid of this problem since the issue the original commit
addresses is less severe.Link: http://lkml.kernel.org/r/20190910123142.7a9c8d2de4d0acbc0977c602@gmail.com
Fixes: d776aaa9895eb6eb77 ("mm/z3fold.c: fix race between migration and destruction")
Reported-by: Agustín Dall'Alba
Signed-off-by: Vitaly Wool
Cc: Vlastimil Babka
Cc: Vitaly Wool
Cc: Shakeel Butt
Cc: Jonathan Adams
Cc: Henry Burns
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
31 Aug, 2019
1 commit
-
Fix lock/unlock imbalance by unlocking *zhdr* before return.
Addresses Coverity ID 1452811 ("Missing unlock")
Link: http://lkml.kernel.org/r/20190826030634.GA4379@embeddedor
Fixes: d776aaa9895e ("mm/z3fold.c: fix race between migration and destruction")
Signed-off-by: Gustavo A. R. Silva
Reviewed-by: Andrew Morton
Cc: Henry Burns
Cc: Vitaly Wool
Cc: Shakeel Butt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Aug, 2019
1 commit
-
In z3fold_destroy_pool() we call destroy_workqueue(&pool->compact_wq).
However, we have no guarantee that migration isn't happening in the
background at that time.Migration directly calls queue_work_on(pool->compact_wq), if destruction
wins that race we are using a destroyed workqueue.Link: http://lkml.kernel.org/r/20190809213828.202833-1-henryburns@google.com
Signed-off-by: Henry Burns
Cc: Vitaly Wool
Cc: Shakeel Butt
Cc: Jonathan Adams
Cc: Henry Burns
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Aug, 2019
2 commits
-
The constraint from the zpool use of z3fold_destroy_pool() is there are
no outstanding handles to memory (so no active allocations), but it is
possible for there to be outstanding work on either of the two wqs in
the pool.Calling z3fold_deregister_migration() before the workqueues are drained
means that there can be allocated pages referencing a freed inode,
causing any thread in compaction to be able to trip over the bad pointer
in PageMovable().Link: http://lkml.kernel.org/r/20190726224810.79660-2-henryburns@google.com
Fixes: 1f862989b04a ("mm/z3fold.c: support page migration")
Signed-off-by: Henry Burns
Reviewed-by: Shakeel Butt
Reviewed-by: Jonathan Adams
Cc: Vitaly Vul
Cc: Vitaly Wool
Cc: David Howells
Cc: Thomas Gleixner
Cc: Al Viro
Cc: Henry Burns
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The constraint from the zpool use of z3fold_destroy_pool() is there are
no outstanding handles to memory (so no active allocations), but it is
possible for there to be outstanding work on either of the two wqs in
the pool.If there is work queued on pool->compact_workqueue when it is called,
z3fold_destroy_pool() will do:z3fold_destroy_pool()
destroy_workqueue(pool->release_wq)
destroy_workqueue(pool->compact_wq)
drain_workqueue(pool->compact_wq)
do_compact_page(zhdr)
kref_put(&zhdr->refcount)
__release_z3fold_page(zhdr, ...)
queue_work_on(pool->release_wq, &pool->work) *BOOM*So compact_wq needs to be destroyed before release_wq.
Link: http://lkml.kernel.org/r/20190726224810.79660-1-henryburns@google.com
Fixes: 5d03a6613957 ("mm/z3fold.c: use kref to prevent page free/compact race")
Signed-off-by: Henry Burns
Reviewed-by: Shakeel Butt
Reviewed-by: Jonathan Adams
Cc: Vitaly Vul
Cc: Vitaly Wool
Cc: David Howells
Cc: Thomas Gleixner
Cc: Al Viro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Jul, 2019
1 commit
-
Pull vfs mount updates from Al Viro:
"The first part of mount updates.Convert filesystems to use the new mount API"
* 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
mnt_init(): call shmem_init() unconditionally
constify ksys_mount() string arguments
don't bother with registering rootfs
init_rootfs(): don't bother with init_ramfs_fs()
vfs: Convert smackfs to use the new mount API
vfs: Convert selinuxfs to use the new mount API
vfs: Convert securityfs to use the new mount API
vfs: Convert apparmorfs to use the new mount API
vfs: Convert openpromfs to use the new mount API
vfs: Convert xenfs to use the new mount API
vfs: Convert gadgetfs to use the new mount API
vfs: Convert oprofilefs to use the new mount API
vfs: Convert ibmasmfs to use the new mount API
vfs: Convert qib_fs/ipathfs to use the new mount API
vfs: Convert efivarfs to use the new mount API
vfs: Convert configfs to use the new mount API
vfs: Convert binfmt_misc to use the new mount API
convenience helper: get_tree_single()
convenience helper get_tree_nodev()
vfs: Kill sget_userns()
...
17 Jul, 2019
4 commits
-
z3fold_page_migration() calls memcpy(new_zhdr, zhdr, PAGE_SIZE).
However, zhdr contains fields that can't be directly coppied over (ex:
list_head, a circular linked list). We only need to initialize the
linked lists in new_zhdr, as z3fold_isolate_page() already ensures that
these lists are emptyAdditionally it is possible that zhdr->work has been placed in a
workqueue. In this case we shouldn't migrate the page, as zhdr->work
references zhdr as opposed to new_zhdr.Link: http://lkml.kernel.org/r/20190716000520.230595-1-henryburns@google.com
Fixes: 1f862989b04ade61d3 ("mm/z3fold.c: support page migration")
Signed-off-by: Henry Burns
Reviewed-by: Shakeel Butt
Cc: Vitaly Vul
Cc: Vitaly Wool
Cc: Jonathan Adams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
z3fold_page_migrate() will never succeed because it attempts to acquire
a lock that has already been taken by migrate.c in __unmap_and_move().__unmap_and_move() migrate.c
trylock_page(oldpage)
move_to_new_page(oldpage_newpage)
a_ops->migrate_page(oldpage, newpage)
z3fold_page_migrate(oldpage, newpage)
trylock_page(oldpage)Link: http://lkml.kernel.org/r/20190710213238.91835-1-henryburns@google.com
Fixes: 1f862989b04a ("mm/z3fold.c: support page migration")
Signed-off-by: Henry Burns
Reviewed-by: Shakeel Butt
Cc: Vitaly Wool
Cc: Vitaly Vul
Cc: Jonathan Adams
Cc: Greg Kroah-Hartman
Cc: Snild Dolkow
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
One of the gfp flags used to show that a page is movable is
__GFP_HIGHMEM. Currently z3fold_alloc() fails when __GFP_HIGHMEM is
passed. Now that z3fold pages are movable, we allow __GFP_HIGHMEM. We
strip the movability related flags from the call to kmem_cache_alloc()
for our slots since it is a kernel allocation.[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20190712222118.108192-1-henryburns@google.com
Signed-off-by: Henry Burns
Acked-by: Vitaly Wool
Reviewed-by: Shakeel Butt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
As reported by Henry Burns:
Running z3fold stress testing with address sanitization showed zhdr->slots
was being used after it was freed.z3fold_free(z3fold_pool, handle)
free_handle(handle)
kmem_cache_free(pool->c_handle, zhdr->slots)
release_z3fold_page_locked_list(kref)
__release_z3fold_page(zhdr, true)
zhdr_to_pool(zhdr)
slots_to_pool(zhdr->slots) *BOOM*To fix this, add pointer to the pool back to z3fold_header and modify
zhdr_to_pool to return zhdr->pool.Link: http://lkml.kernel.org/r/20190708134808.e89f3bfadd9f6ffd7eff9ba9@gmail.com
Fixes: 7c2b8baa61fe ("mm/z3fold.c: add structure for buddy handles")
Signed-off-by: Vitaly Wool
Reported-by: Henry Burns
Reviewed-by: Shakeel Butt
Cc: Jonathan Adams
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Jul, 2019
1 commit
-
Following zsmalloc.c's example we call trylock_page() and unlock_page().
Also make z3fold_page_migrate() assert that newpage is passed in locked,
as per the documentation.[akpm@linux-foundation.org: fix trylock_page return value test, per Shakeel]
Link: http://lkml.kernel.org/r/20190702005122.41036-1-henryburns@google.com
Link: http://lkml.kernel.org/r/20190702233538.52793-1-henryburns@google.com
Signed-off-by: Henry Burns
Suggested-by: Vitaly Wool
Acked-by: Vitaly Wool
Acked-by: David Rientjes
Reviewed-by: Shakeel Butt
Cc: Vitaly Vul
Cc: Mike Rapoport
Cc: Xidong Wang
Cc: Jonathan Adams
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Jun, 2019
1 commit
-
kmem_cache_alloc() may be called from z3fold_alloc() in atomic context, so
we need to pass correct gfp flags to avoid "scheduling while atomic" bug.Link: http://lkml.kernel.org/r/20190523153245.119dfeed55927e8755250ddd@gmail.com
Fixes: 7c2b8baa61fe5 ("mm/z3fold.c: add structure for buddy handles")
Signed-off-by: Vitaly Wool
Reviewed-by: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 May, 2019
2 commits
-
Convert the zsfold filesystem to the new internal mount API as the old one
will be obsoleted and removed. This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.See Documentation/filesystems/mount_api.txt for more information.
Signed-off-by: David Howells
-
Once upon a time we used to set ->d_name of e.g. pipefs root
so that d_path() on pipes would work. These days it's
completely pointless - dentries of pipes are not even connected
to pipefs root. However, mount_pseudo() had set the root
dentry name (passed as the second argument) and callers
kept inventing names to pass to it. Including those that
didn't *have* any non-root dentries to start with...All of that had been pointless for about 8 years now; it's
time to get rid of that cargo-culting...Signed-off-by: Al Viro
21 May, 2019
2 commits
-
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have MODULE_LICENCE("GPL*") inside which was used in the initial
scan/conversion to ignore the fileThese files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:GPL-2.0-only
Signed-off-by: Thomas Gleixner
Signed-off-by: Greg Kroah-Hartman -
Don't bother with dentry_operations as no dentry is ever allocated.
Signed-off-by: David Howells
15 May, 2019
4 commits
-
Now that we are not using page address in handles directly, we can make
z3fold pages movable to decrease the memory fragmentation z3fold may
create over time.This patch starts advertising non-headless z3fold pages as movable and
uses the existing kernel infrastructure to implement moving of such pages
per memory management subsystem's request. It thus implements 3 required
callbacks for page migration:* isolation callback: z3fold_page_isolate(): try to isolate the page by
removing it from all lists. Pages scheduled for some activity and
mapped pages will not be isolated. Return true if isolation was
successful or false otherwise* migration callback: z3fold_page_migrate(): re-check critical
conditions and migrate page contents to the new page provided by the
memory subsystem. Returns 0 on success or negative error code otherwise* putback callback: z3fold_page_putback(): put back the page if
z3fold_page_migrate() for it failed permanently (i. e. not with
-EAGAIN code).[lkp@intel.com: z3fold_page_isolate() can be static]
Link: http://lkml.kernel.org/r/20190419130924.GA161478@ivb42
Link: http://lkml.kernel.org/r/20190417103922.31253da5c366c4ebe0419cfc@gmail.com
Signed-off-by: Vitaly Wool
Signed-off-by: kbuild test robot
Cc: Bartlomiej Zolnierkiewicz
Cc: Dan Streetman
Cc: Krzysztof Kozlowski
Cc: Oleksiy Avramchenko
Cc: Uladzislau Rezki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
For z3fold to be able to move its pages per request of the memory
subsystem, it should not use direct object addresses in handles. Instead,
it will create abstract handles (3 per page) which will contain pointers
to z3fold objects. Thus, it will be possible to change these pointers
when z3fold page is moved.Link: http://lkml.kernel.org/r/20190417103826.484eaf18c1294d682769880f@gmail.com
Signed-off-by: Vitaly Wool
Cc: Bartlomiej Zolnierkiewicz
Cc: Dan Streetman
Cc: Krzysztof Kozlowski
Cc: Oleksiy Avramchenko
Cc: Uladzislau Rezki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The current z3fold implementation only searches this CPU's page lists for
a fitting page to put a new object into. This patch adds quick search for
very well fitting pages (i. e. those having exactly the required number
of free space) on other CPUs too, before allocating a new page for that
object.Link: http://lkml.kernel.org/r/20190417103733.72ae81abe1552397c95a008e@gmail.com
Signed-off-by: Vitaly Wool
Cc: Bartlomiej Zolnierkiewicz
Cc: Dan Streetman
Cc: Krzysztof Kozlowski
Cc: Oleksiy Avramchenko
Cc: Uladzislau Rezki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Patch series "z3fold: support page migration", v2.
This patchset implements page migration support and slightly better buddy
search. To implement page migration support, z3fold has to move away from
the current scheme of handle encoding. i. e. stop encoding page address
in handles. Instead, a small per-page structure is created which will
contain actual addresses for z3fold objects, while pointers to fields of
that structure will be used as handles.Thus, it will be possible to change the underlying addresses to reflect
page migration.To support migration itself, 3 callbacks will be implemented:
1: isolation callback: z3fold_page_isolate(): try to isolate the page
by removing it from all lists. Pages scheduled for some activity and
mapped pages will not be isolated. Return true if isolation was
successful or false otherwise2: migration callback: z3fold_page_migrate(): re-check critical
conditions and migrate page contents to the new page provided by the
system. Returns 0 on success or negative error code otherwise3: putback callback: z3fold_page_putback(): put back the page if
z3fold_page_migrate() for it failed permanently (i. e. not with
-EAGAIN code).To make sure an isolated page doesn't get freed, its kref is incremented
in z3fold_page_isolate() and decremented during post-migration compaction,
if migration was successful, or by z3fold_page_putback() in the other
case.Since the new handle encoding scheme implies slight memory consumption
increase, better buddy search (which decreases memory consumption) is
included in this patchset.This patch (of 4):
Introduce a separate helper function for object allocation, as well as 2
smaller helpers to add a buddy to the list and to get a pointer to the
pool from the z3fold header. No functional changes here.Link: http://lkml.kernel.org/r/20190417103633.a4bb770b5bf0fb7e43ce1666@gmail.com
Signed-off-by: Vitaly Wool
Cc: Dan Streetman
Cc: Bartlomiej Zolnierkiewicz
Cc: Krzysztof Kozlowski
Cc: Oleksiy Avramchenko
Cc: Uladzislau Rezki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
19 Nov, 2018
1 commit
-
Reclaim and free can race on an object which is basically fine but in
order for reclaim to be able to map "freed" object we need to encode
object length in the handle. handle_to_chunks() is then introduced to
extract object length from a handle and use it during mapping.Moreover, to avoid racing on a z3fold "headless" page release, we should
not try to free that page in z3fold_free() if the reclaim bit is set.
Also, in the unlikely case of trying to reclaim a page being freed, we
should not proceed with that page.While at it, fix the page accounting in reclaim function.
This patch supersedes "[PATCH] z3fold: fix reclaim lock-ups".
Link: http://lkml.kernel.org/r/20181105162225.74e8837d03583a9b707cf559@gmail.com
Signed-off-by: Vitaly Wool
Signed-off-by: Jongseok Kim
Reported-by-by: Jongseok Kim
Reviewed-by: Snild Dolkow
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 May, 2018
1 commit
-
Do not try to optimize in-page object layout while the page is under
reclaim. This fixes lock-ups on reclaim and improves reclaim
performance at the same time.[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20180430125800.444cae9706489f412ad12621@gmail.com
Signed-off-by: Vitaly Wool
Reported-by: Guenter Roeck
Tested-by: Guenter Roeck
Cc:
Cc: Matthew Wilcox
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Apr, 2018
2 commits
-
We have a perfectly good macro to determine whether the gfp flags allow
you to sleep or not; use it instead of trying to infer it.Link: http://lkml.kernel.org/r/20180408062206.GC16007@bombadil.infradead.org
Signed-off-by: Matthew Wilcox
Reviewed-by: Andrew Morton
Cc: Vitaly Wool
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In z3fold_create_pool(), the memory allocated by __alloc_percpu() is not
released on the error path that pool->compact_wq , which holds the
return value of create_singlethread_workqueue(), is NULL. This will
result in a memory leak bug.[akpm@linux-foundation.org: fix oops on kzalloc() failure, check __alloc_percpu() retval]
Link: http://lkml.kernel.org/r/1522803111-29209-1-git-send-email-wangxidong_97@163.com
Signed-off-by: Xidong Wang
Reviewed-by: Andrew Morton
Cc: Vitaly Wool
Cc: Mike Rapoport
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
06 Apr, 2018
1 commit
-
Currently if z3fold couldn't find an unbuddied page it would first try
to pull a page off the stale list. The problem with this approach is
that we can't 100% guarantee that the page is not processed by the
workqueue thread at the same time unless we run cancel_work_sync() on
it, which we can't do if we're in an atomic context. So let's just
limit stale list usage to non-atomic contexts only.Link: http://lkml.kernel.org/r/47ab51e7-e9c1-d30e-ab17-f734dbc3abce@gmail.com
Signed-off-by: Vitaly Vul
Reviewed-by: Andrew Morton
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Feb, 2018
1 commit
-
There are several places where parameter descriptions do no match the
actual code. Fix it.Link: http://lkml.kernel.org/r/1516700871-22279-3-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport
Cc: Jonathan Corbet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Nov, 2017
1 commit
-
There is a race in the current z3fold implementation between
do_compact() called in a work queue context and the page release
procedure when page's kref goes to 0.do_compact() may be waiting for page lock, which is released by
release_z3fold_page_locked right before putting the page onto the
"stale" list, and then the page may be freed as do_compact() modifies
its contents.The mechanism currently implemented to handle that (checking the
PAGE_STALE flag) is not reliable enough. Instead, we'll use page's kref
counter to guarantee that the page is not released if its compaction is
scheduled. It then becomes compaction function's responsibility to
decrease the counter and quit immediately if the page was actually
freed.Link: http://lkml.kernel.org/r/20171117092032.00ea56f42affbed19f4fcc6c@gmail.com
Signed-off-by: Vitaly Wool
Cc:
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Oct, 2017
2 commits
-
Fix the situation when clear_bit() is called for page->private before
the page pointer is actually assigned. While at it, remove work_busy()
check because it is costly and does not give 100% guarantee anyway.Signed-off-by: Vitaly Wool
Cc: Dan Streetman
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It is possible that on a (partially) unsuccessful page reclaim,
kref_put() called in z3fold_reclaim_page() does not yield page release,
but the page is released shortly afterwards by another thread. Then
z3fold_reclaim_page() would try to list_add() that (released) page again
which is obviously a bug.To avoid that, spin_lock() has to be taken earlier, before the
kref_put() call mentioned earlier.Link: http://lkml.kernel.org/r/20170913162937.bfff21c7d12b12a5f47639fd@gmail.com
Signed-off-by: Vitaly Wool
Cc: Dan Streetman
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Sep, 2017
1 commit
-
It's been noted that z3fold doesn't scale well when it's run in a large
number of threads on many cores, which can be easily reproduced with fio
'randrw' test with --numjobs=32. E.g. the result for 1 cluster (4 cores)
is:Run status group 0 (all jobs):
READ: io=244785MB, aggrb=496883KB/s, minb=15527KB/s, ...
WRITE: io=246735MB, aggrb=500841KB/s, minb=15651KB/s, ...While for 8 cores (2 clusters) the result is:
Run status group 0 (all jobs):
READ: io=244785MB, aggrb=265942KB/s, minb=8310KB/s, ...
WRITE: io=246735MB, aggrb=268060KB/s, minb=8376KB/s, ...The bottleneck here is the pool lock which many threads become waiting
upon. To reduce that spin lock contention, z3fold can operate only on
the lists local to the current CPU whenever possible. Due to the nature
of z3fold unbuddied list handling (it only takes the first entry off the
list on a hot path), if the z3fold pool is big enough and balanced well
enough, limiting search to only local unbuddied list doesn't lead to a
significant compression ratio degrade (2.57x vs 2.65x in our
measurements).This patch also introduces two worker threads: one for async in-page
object layout optimization and one for releasing freed pages. This is
done to speed up z3fold_free() which is often on a hot path.The fio results for 8-core case are now the following:
Run status group 0 (all jobs):
READ: io=244785MB, aggrb=1568.3MB/s, minb=50182KB/s, ...
WRITE: io=246735MB, aggrb=1580.8MB/s, minb=50582KB/s, ...So we're in for almost 6x performance increase.
Link: http://lkml.kernel.org/r/20170806181443.f9b65018f8bde25ef990f9e8@gmail.com
Signed-off-by: Vitaly Wool
Cc: Dan Streetman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds