Eric Lee / smarc-fsl-linux-kernel

01 Dec, 2018

1 commit

e28ae7aaa z3fold: fix possible reclaim races ... Browse Code »

[ Upstream commit ca0246bb97c23da9d267c2107c07fb77e38205c9 ]

Reclaim and free can race on an object which is basically fine but in
order for reclaim to be able to map "freed" object we need to encode
object length in the handle. handle_to_chunks() is then introduced to
extract object length from a handle and use it during mapping.

Moreover, to avoid racing on a z3fold "headless" page release, we should
not try to free that page in z3fold_free() if the reclaim bit is set.
Also, in the unlikely case of trying to reclaim a page being freed, we
should not proceed with that page.

While at it, fix the page accounting in reclaim function.

This patch supersedes "[PATCH] z3fold: fix reclaim lock-ups".

Link: http://lkml.kernel.org/r/20181105162225.74e8837d03583a9b707cf559@gmail.com
Signed-off-by: Vitaly Wool
Signed-off-by: Jongseok Kim
Reported-by-by: Jongseok Kim
Reviewed-by: Snild Dolkow
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin

Vitaly Wool
2018-12-01 16:42:54 +0800

30 May, 2018

1 commit

3a0de65ac z3fold: fix memory leak ... Browse Code »

[ Upstream commit 1ec6995d1290bfb87cc3a51f0836c889e857cef9 ]

In z3fold_create_pool(), the memory allocated by __alloc_percpu() is not
released on the error path that pool->compact_wq , which holds the
return value of create_singlethread_workqueue(), is NULL. This will
result in a memory leak bug.

[akpm@linux-foundation.org: fix oops on kzalloc() failure, check __alloc_percpu() retval]
Link: http://lkml.kernel.org/r/1522803111-29209-1-git-send-email-wangxidong_97@163.com
Signed-off-by: Xidong Wang
Reviewed-by: Andrew Morton
Cc: Vitaly Wool
Cc: Mike Rapoport
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Xidong Wang
2018-05-30 13:52:23 +0800

16 May, 2018

1 commit

21fb6d8bc z3fold: fix reclaim lock-ups ... Browse Code »

commit 6098d7e136692f9c6e23ae362c62ec822343e4d5 upstream.

Do not try to optimize in-page object layout while the page is under
reclaim. This fixes lock-ups on reclaim and improves reclaim
performance at the same time.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20180430125800.444cae9706489f412ad12621@gmail.com
Signed-off-by: Vitaly Wool
Reported-by: Guenter Roeck
Tested-by: Guenter Roeck
Cc:
Cc: Matthew Wilcox
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Vitaly Wool
2018-05-16 16:10:27 +0800

30 Nov, 2017

1 commit

c1a14af38 mm/z3fold.c: use kref to prevent page free/compact race ... Browse Code »

commit 5d03a6613957785e94af7a4a6212ad4af66aa5c2 upstream.

There is a race in the current z3fold implementation between
do_compact() called in a work queue context and the page release
procedure when page's kref goes to 0.

do_compact() may be waiting for page lock, which is released by
release_z3fold_page_locked right before putting the page onto the
"stale" list, and then the page may be freed as do_compact() modifies
its contents.

The mechanism currently implemented to handle that (checking the
PAGE_STALE flag) is not reliable enough. Instead, we'll use page's kref
counter to guarantee that the page is not released if its compaction is
scheduled. It then becomes compaction function's responsibility to
decrease the counter and quit immediately if the page was actually
freed.

Link: http://lkml.kernel.org/r/20171117092032.00ea56f42affbed19f4fcc6c@gmail.com
Signed-off-by: Vitaly Wool
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman

Vitaly Wool
2017-11-30 16:40:44 +0800

04 Oct, 2017

2 commits

355293574 z3fold: fix stale list handling ... Browse Code »

Fix the situation when clear_bit() is called for page->private before
the page pointer is actually assigned. While at it, remove work_busy()
check because it is costly and does not give 100% guarantee anyway.

Signed-off-by: Vitaly Wool
Cc: Dan Streetman
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vitaly Wool
2017-10-04 08:54:24 +0800
d5567c9df z3fold: fix potential race in z3fold_reclaim_page ... Browse Code »

It is possible that on a (partially) unsuccessful page reclaim,
kref_put() called in z3fold_reclaim_page() does not yield page release,
but the page is released shortly afterwards by another thread. Then
z3fold_reclaim_page() would try to list_add() that (released) page again
which is obviously a bug.

To avoid that, spin_lock() has to be taken earlier, before the
kref_put() call mentioned earlier.

Link: http://lkml.kernel.org/r/20170913162937.bfff21c7d12b12a5f47639fd@gmail.com
Signed-off-by: Vitaly Wool
Cc: Dan Streetman
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vitaly Wool
2017-10-04 08:54:24 +0800

07 Sep, 2017

1 commit

d30561c56 z3fold: use per-cpu unbuddied lists ... Browse Code »

It's been noted that z3fold doesn't scale well when it's run in a large
number of threads on many cores, which can be easily reproduced with fio
'randrw' test with --numjobs=32. E.g. the result for 1 cluster (4 cores)
is:

Run status group 0 (all jobs):
READ: io=244785MB, aggrb=496883KB/s, minb=15527KB/s, ...
WRITE: io=246735MB, aggrb=500841KB/s, minb=15651KB/s, ...

While for 8 cores (2 clusters) the result is:

Run status group 0 (all jobs):
READ: io=244785MB, aggrb=265942KB/s, minb=8310KB/s, ...
WRITE: io=246735MB, aggrb=268060KB/s, minb=8376KB/s, ...

The bottleneck here is the pool lock which many threads become waiting
upon. To reduce that spin lock contention, z3fold can operate only on
the lists local to the current CPU whenever possible. Due to the nature
of z3fold unbuddied list handling (it only takes the first entry off the
list on a hot path), if the z3fold pool is big enough and balanced well
enough, limiting search to only local unbuddied list doesn't lead to a
significant compression ratio degrade (2.57x vs 2.65x in our
measurements).

This patch also introduces two worker threads: one for async in-page
object layout optimization and one for releasing freed pages. This is
done to speed up z3fold_free() which is often on a hot path.

The fio results for 8-core case are now the following:

Run status group 0 (all jobs):
READ: io=244785MB, aggrb=1568.3MB/s, minb=50182KB/s, ...
WRITE: io=246735MB, aggrb=1580.8MB/s, minb=50582KB/s, ...

So we're in for almost 6x performance increase.

Link: http://lkml.kernel.org/r/20170806181443.f9b65018f8bde25ef990f9e8@gmail.com
Signed-off-by: Vitaly Wool
Cc: Dan Streetman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vitaly Wool
2017-09-07 08:27:30 +0800

14 Apr, 2017

1 commit

76e32a2a0 z3fold: fix page locking in z3fold_alloc() ... Browse Code »

Stress testing of the current z3fold implementation on a 8-core system
revealed it was possible that a z3fold page deleted from its unbuddied
list in z3fold_alloc() would be put on another unbuddied list by
z3fold_free() while z3fold_alloc() is still processing it. This has
been introduced with commit 5a27aa822 ("z3fold: add kref refcounting")
due to the removal of special handling of a z3fold page not on any list
in z3fold_free().

To fix this, the z3fold page lock should be taken in z3fold_alloc()
before the pool lock is released. To avoid deadlocking, we just try to
lock the page as soon as we get a hold of it, and if trylock fails, we
drop this page and take the next one.

Signed-off-by: Vitaly Wool
Cc: Dan Streetman
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vitaly Wool
2017-04-14 09:24:20 +0800

17 Mar, 2017

1 commit

271df90e4 z3fold: fix spinlock unlocking in page reclaim ... Browse Code »

Commmit 5a27aa822029 ("z3fold: add kref refcounting") introduced a bug
in z3fold_reclaim_page() with function exit that may leave pool->lock
spinlock held. Here comes the trivial fix.

Fixes: 5a27aa822029 ("z3fold: add kref refcounting")
Link: http://lkml.kernel.org/r/20170311222239.7b83d8e7ef1914e05497649f@gmail.com
Reported-by: Alexey Khoroshilov
Signed-off-by: Vitaly Wool
Cc: Dan Streetman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vitaly Wool
2017-03-17 07:56:18 +0800

25 Feb, 2017

5 commits

5a27aa822 z3fold: add kref refcounting ... Browse Code »

With both coming and already present locking optimizations, introducing
kref to reference-count z3fold objects is the right thing to do.
Moreover, it makes buddied list no longer necessary, and allows for a
simpler handling of headless pages.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170131214650.8ea78033d91ded233f552bc0@gmail.com
Signed-off-by: Vitaly Wool
Reviewed-by: Dan Streetman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vitaly Wool
2017-02-25 09:46:54 +0800
2f1e5e4d8 z3fold: use per-page spinlock ... Browse Code »

Most of z3fold operations are in-page, such as modifying z3fold page
header or moving z3fold objects within a page. Taking per-pool spinlock
to protect per-page objects is therefore suboptimal, and the idea of
having a per-page spinlock (or rwlock) has been around for some time.

This patch implements spinlock-based per-page locking mechanism which is
lightweight enough to normally fit ok into the z3fold header.

Link: http://lkml.kernel.org/r/20170131214438.433e0a5fda908337b63206d3@gmail.com
Signed-off-by: Vitaly Wool
Reviewed-by: Dan Streetman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vitaly Wool
2017-02-25 09:46:54 +0800
1b096e5ae z3fold: extend compaction function ... Browse Code »

z3fold_compact_page() currently only handles the situation when there's
a single middle chunk within the z3fold page. However it may be worth
it to move middle chunk closer to either first or last chunk, whichever
is there, if the gap between them is big enough.

This patch adds the relevant code, using BIG_CHUNK_GAP define as a
threshold for middle chunk to be worth moving.

Link: http://lkml.kernel.org/r/20170131214334.c4f3eac9a477af0fa9a22c46@gmail.com
Signed-off-by: Vitaly Wool
Reviewed-by: Dan Streetman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vitaly Wool
2017-02-25 09:46:54 +0800
ede93213a z3fold: fix header size related issues ... Browse Code »

Currently the whole kernel build will be stopped if the size of struct
z3fold_header is greater than the size of one chunk, which is 64 bytes
by default. This patch instead defines the offset for z3fold objects as
the size of the z3fold header in chunks.

Fixed also are the calculation of num_free_chunks() and the address to
move the middle chunk to in case of in-page compaction in
z3fold_compact_page().

Link: http://lkml.kernel.org/r/20170131214057.d98677032bc7b1c6c59a80c9@gmail.com
Signed-off-by: Vitaly Wool
Reviewed-by: Dan Streetman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vitaly Wool
2017-02-25 09:46:54 +0800
12d59ae67 z3fold: make pages_nr atomic ... Browse Code »

Convert pages_nr per-pool counter to atomic64_t.

Link: http://lkml.kernel.org/r/20170131213946.b828676ab17bbea42022c213@gmail.com
Signed-off-by: Vitaly Wool
Reviewed-by: Dan Streetman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vitaly Wool
2017-02-25 09:46:54 +0800

23 Feb, 2017

1 commit

f201ebd87 mm/z3fold.c: limit first_num to the actual range of possible buddy indexes ... Browse Code »

At present, Tying the first_num size to NCHUNKS_ORDER is confusing. the
number of chunks is completely unrelated to the number of buddies.

The patch limits the first_num to actual range of possible buddy indexes.
and that is more reasonable and obvious without functional change.

Link: http://lkml.kernel.org/r/1476776569-29504-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang
Suggested-by: Dan Streetman
Acked-by: Dan Streetman
Acked-by: Vitaly Wool
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

zhong jiang
2017-02-23 08:41:31 +0800

04 Jun, 2016

1 commit

43afc1941 mm/z3fold.c: avoid modifying HEADLESS page and minor cleanup ... Browse Code »

Fix erroneous z3fold header access in a HEADLESS page in reclaim
function, and change one remaining direct handle-to-buddy conversion to
use the appropriate helper.

Link: http://lkml.kernel.org/r/5748706F.9020208@gmail.com
Signed-off-by: Vitaly Wool
Reviewed-by: Dan Streetman
Cc: Seth Jennings
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vitaly Wool
2016-06-04 07:02:55 +0800

21 May, 2016

1 commit

9a001fc19 z3fold: the 3-fold allocator for compressed pages ... Browse Code »

This patch introduces z3fold, a special purpose allocator for storing
compressed pages. It is designed to store up to three compressed pages
per physical page. It is a ZBUD derivative which allows for higher
compression ratio keeping the simplicity and determinism of its
predecessor.

This patch comes as a follow-up to the discussions at the Embedded Linux
Conference in San-Diego related to the talk [1]. The outcome of these
discussions was that it would be good to have a compressed page
allocator as stable and deterministic as zbud with with higher
compression ratio.

To keep the determinism and simplicity, z3fold, just like zbud, always
stores an integral number of compressed pages per page, but it can store
up to 3 pages unlike zbud which can store at most 2. Therefore the
compression ratio goes to around 2.6x while zbud's one is around 1.7x.

The patch is based on the latest linux.git tree.

This version has been updated after testing on various simulators (e.g.
ARM Versatile Express, MIPS Malta, x86_64/Haswell) and basing on
comments from Dan Streetman [3].

[1] https://openiotelc2016.sched.org/event/6DAC/swapping-and-embedded-compression-relieves-the-pressure-vitaly-wool-softprise-consulting-ou
[2] https://lkml.org/lkml/2016/4/21/799
[3] https://lkml.org/lkml/2016/5/4/852

Link: http://lkml.kernel.org/r/20160509151753.ec3f9fda3c9898d31ff52a32@gmail.com
Signed-off-by: Vitaly Wool
Cc: Seth Jennings
Cc: Dan Streetman
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vitaly Wool
2016-05-21 08:58:30 +0800