30 Dec, 2020

2 commits

  • commit dcf5aedb24f899d537e21c18ea552c780598d352 upstream.

    Use temporary slots in reclaim function to avoid possible race when
    freeing those.

    While at it, make sure we check CLAIMED flag under page lock in the
    reclaim function to make sure we are not racing with z3fold_alloc().

    Link: https://lkml.kernel.org/r/20201209145151.18994-4-vitaly.wool@konsulko.com
    Signed-off-by: Vitaly Wool
    Cc:
    Cc: Mike Galbraith
    Cc: Sebastian Andrzej Siewior
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Wool
     
  • commit fc5488651c7d840c9cad9b0f273f2f31bd03413a upstream.

    Patch series "z3fold: stability / rt fixes".

    Address z3fold stability issues under stress load, primarily in the
    reclaim and free aspects. Besides, it fixes the locking problems that
    were only seen in real-time kernel configuration.

    This patch (of 3):

    There used to be two places in the code where slots could be freed, namely
    when freeing the last allocated handle from the slots and when releasing
    the z3fold header these slots aree linked to. The logic to decide on
    whether to free certain slots was complicated and error prone in both
    functions and it led to failures in RT case.

    To fix that, make free_handle() the single point of freeing slots.

    Link: https://lkml.kernel.org/r/20201209145151.18994-1-vitaly.wool@konsulko.com
    Link: https://lkml.kernel.org/r/20201209145151.18994-2-vitaly.wool@konsulko.com
    Signed-off-by: Vitaly Wool
    Tested-by: Mike Galbraith
    Cc: Sebastian Andrzej Siewior
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Vitaly Wool
     

14 Oct, 2020

1 commit

  • alloc_slots() allocates memory for slots using kmem_cache_alloc(), then
    memsets it. We can just use kmem_cache_zalloc().

    Signed-off-by: Hui Su
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Link: https://lkml.kernel.org/r/20200926100834.GA184671@rlk
    Signed-off-by: Linus Torvalds

    Hui Su
     

29 May, 2020

1 commit

  • Kmemleak reported many leaks while under memory pressue in,

    slots = alloc_slots(pool, gfp);

    which is referenced by "zhdr" in init_z3fold_page(),

    zhdr->slots = slots;

    However, "zhdr" could be gone without freeing slots as the later will be
    freed separately when the last "handle" off of "handles" array is freed.
    It will be within "slots" which is always aligned.

    unreferenced object 0xc000000fdadc1040 (size 104):
    comm "oom04", pid 140476, jiffies 4295359280 (age 3454.970s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
    backtrace:
    z3fold_zpool_malloc+0x7b0/0xe10
    alloc_slots at mm/z3fold.c:214
    (inlined by) init_z3fold_page at mm/z3fold.c:412
    (inlined by) z3fold_alloc at mm/z3fold.c:1161
    (inlined by) z3fold_zpool_malloc at mm/z3fold.c:1735
    zpool_malloc+0x34/0x50
    zswap_frontswap_store+0x60c/0xda0
    zswap_frontswap_store at mm/zswap.c:1093
    __frontswap_store+0x128/0x330
    swap_writepage+0x58/0x110
    pageout+0x16c/0xa40
    shrink_page_list+0x1ac8/0x25c0
    shrink_inactive_list+0x270/0x730
    shrink_lruvec+0x444/0xf30
    shrink_node+0x2a4/0x9c0
    do_try_to_free_pages+0x158/0x640
    try_to_free_pages+0x1bc/0x5f0
    __alloc_pages_slowpath.constprop.60+0x4dc/0x15a0
    __alloc_pages_nodemask+0x520/0x650
    alloc_pages_vma+0xc0/0x420
    handle_mm_fault+0x1174/0x1bf0

    Signed-off-by: Qian Cai
    Signed-off-by: Andrew Morton
    Acked-by: Vitaly Wool
    Acked-by: Catalin Marinas
    Link: http://lkml.kernel.org/r/20200522220052.2225-1-cai@lca.pw
    Signed-off-by: Linus Torvalds

    Qian Cai
     

24 May, 2020

1 commit

  • free_handle() for a foreign handle may race with inter-page compaction,
    what can lead to memory corruption.

    To avoid that, take write lock not read lock in free_handle to be
    synchronized with __release_z3fold_page().

    For example KASAN can detect it:

    ==================================================================
    BUG: KASAN: use-after-free in LZ4_decompress_safe+0x2c4/0x3b8
    Read of size 1 at addr ffffffc976695ca3 by task GoogleApiHandle/4121

    CPU: 0 PID: 4121 Comm: GoogleApiHandle Tainted: P S OE 4.19.81-perf+ #162
    Hardware name: Sony Mobile Communications. PDX-203(KONA) (DT)
    Call trace:
    LZ4_decompress_safe+0x2c4/0x3b8
    lz4_decompress_crypto+0x3c/0x70
    crypto_decompress+0x58/0x70
    zcomp_decompress+0xd4/0x120
    ...

    Apart from that, initialize zhdr->mapped_count in init_z3fold_page() and
    remove "newpage" variable because it is not used anywhere.

    Signed-off-by: Uladzislau Rezki
    Signed-off-by: Vitaly Wool
    Signed-off-by: Andrew Morton
    Cc: Qian Cai
    Cc: Raymond Jennings
    Cc:
    Link: http://lkml.kernel.org/r/20200520082100.28876-1-vitaly.wool@konsulko.com
    Signed-off-by: Linus Torvalds

    Uladzislau Rezki
     

06 Mar, 2020

1 commit

  • rwlock.h should not be included directly. Instead linux/splinlock.h
    should be included. One thing it does is to break the RT build.

    Signed-off-by: Andrew Morton
    Signed-off-by: Sebastian Andrzej Siewior
    Cc: Peter Zijlstra
    Cc: Vitaly Wool
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20200224133631.1510569-1-bigeasy@linutronix.de
    Signed-off-by: Linus Torvalds

    Sebastian Andrzej Siewior
     

02 Dec, 2019

1 commit

  • For each page scheduled for compaction (e. g. by z3fold_free()), try to
    apply inter-page compaction before running the traditional/ existing
    intra-page compaction. That means, if the page has only one buddy, we
    treat that buddy as a new object that we aim to place into an existing
    z3fold page. If such a page is found, that object is transferred and the
    old page is freed completely. The transferred object is named "foreign"
    and treated slightly differently thereafter.

    Namely, we increase "foreign handle" counter for the new page. Pages with
    non-zero "foreign handle" count become unmovable. This patch implements
    "foreign handle" detection when a handle is freed to decrement the foreign
    handle counter accordingly, so a page may as well become movable again as
    the time goes by.

    As a result, we almost always have exactly 3 objects per page and
    significantly better average compression ratio.

    [cai@lca.pw: fix -Wunused-but-set-variable warnings]
    Link: http://lkml.kernel.org/r/1570542062-29144-1-git-send-email-cai@lca.pw
    [vitalywool@gmail.com: avoid subtle race when freeing slots]
    Link: http://lkml.kernel.org/r/20191127152118.6314b99074b0626d4c5a8835@gmail.com
    [vitalywool@gmail.com: compact objects more accurately]
    Link: http://lkml.kernel.org/r/20191127152216.6ad33745a21ba71c53606acb@gmail.com
    [vitalywool@gmail.com: protect handle reads]
    Link: http://lkml.kernel.org/r/20191127152345.8059852f60947686674d726d@gmail.com
    Link: http://lkml.kernel.org/r/20191006041457.24113-1-vitalywool@gmail.com
    Signed-off-by: Vitaly Wool
    Cc: Dan Streetman
    Cc: Henry Burns
    Cc: Shakeel Butt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

08 Oct, 2019

1 commit

  • There's a really hard to reproduce race in z3fold between z3fold_free()
    and z3fold_reclaim_page(). z3fold_reclaim_page() can claim the page
    after z3fold_free() has checked if the page was claimed and
    z3fold_free() will then schedule this page for compaction which may in
    turn lead to random page faults (since that page would have been
    reclaimed by then).

    Fix that by claiming page in the beginning of z3fold_free() and not
    forgetting to clear the claim in the end.

    [vitalywool@gmail.com: v2]
    Link: http://lkml.kernel.org/r/20190928113456.152742cf@bigdell
    Link: http://lkml.kernel.org/r/20190926104844.4f0c6efa1366b8f5741eaba9@gmail.com
    Signed-off-by: Vitaly Wool
    Reported-by: Markus Linnala
    Cc: Dan Streetman
    Cc: Vlastimil Babka
    Cc: Henry Burns
    Cc: Shakeel Butt
    Cc: Markus Linnala
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

25 Sep, 2019

3 commits

  • Currently there is a leak in init_z3fold_page() -- it allocates handles
    from kmem cache even for headless pages, but then they are never used and
    never freed, so eventually kmem cache may get exhausted. This patch
    provides a fix for that.

    Link: http://lkml.kernel.org/r/20190917185352.44cf285d3ebd9e64548de5de@gmail.com
    Signed-off-by: Vitaly Wool
    Reported-by: Markus Linnala
    Tested-by: Markus Linnala
    Cc: Dan Streetman
    Cc: Henry Burns
    Cc: Shakeel Butt
    Cc: Vlastimil Babka
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • z3fold_page_reclaim()'s retry mechanism is broken: on a second iteration
    it will have zhdr from the first one so that zhdr is no longer in line
    with struct page. That leads to crashes when the system is stressed.

    Fix that by moving zhdr assignment up.

    While at it, protect against using already freed handles by using own
    local slots structure in z3fold_page_reclaim().

    Link: http://lkml.kernel.org/r/20190908162919.830388dc7404d1e2c80f4095@gmail.com
    Signed-off-by: Vitaly Wool
    Reported-by: Markus Linnala
    Reported-by: Chris Murphy
    Reported-by: Agustin Dall'Alba
    Cc: "Maciej S. Szmigiero"
    Cc: Shakeel Butt
    Cc: Henry Burns
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • With the original commit applied, z3fold_zpool_destroy() may get blocked
    on wait_event() for indefinite time. Revert this commit for the time
    being to get rid of this problem since the issue the original commit
    addresses is less severe.

    Link: http://lkml.kernel.org/r/20190910123142.7a9c8d2de4d0acbc0977c602@gmail.com
    Fixes: d776aaa9895eb6eb77 ("mm/z3fold.c: fix race between migration and destruction")
    Reported-by: Agustín Dall'Alba
    Signed-off-by: Vitaly Wool
    Cc: Vlastimil Babka
    Cc: Vitaly Wool
    Cc: Shakeel Butt
    Cc: Jonathan Adams
    Cc: Henry Burns
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

31 Aug, 2019

1 commit

  • Fix lock/unlock imbalance by unlocking *zhdr* before return.

    Addresses Coverity ID 1452811 ("Missing unlock")

    Link: http://lkml.kernel.org/r/20190826030634.GA4379@embeddedor
    Fixes: d776aaa9895e ("mm/z3fold.c: fix race between migration and destruction")
    Signed-off-by: Gustavo A. R. Silva
    Reviewed-by: Andrew Morton
    Cc: Henry Burns
    Cc: Vitaly Wool
    Cc: Shakeel Butt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gustavo A. R. Silva
     

25 Aug, 2019

1 commit

  • In z3fold_destroy_pool() we call destroy_workqueue(&pool->compact_wq).
    However, we have no guarantee that migration isn't happening in the
    background at that time.

    Migration directly calls queue_work_on(pool->compact_wq), if destruction
    wins that race we are using a destroyed workqueue.

    Link: http://lkml.kernel.org/r/20190809213828.202833-1-henryburns@google.com
    Signed-off-by: Henry Burns
    Cc: Vitaly Wool
    Cc: Shakeel Butt
    Cc: Jonathan Adams
    Cc: Henry Burns
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henry Burns
     

14 Aug, 2019

2 commits

  • The constraint from the zpool use of z3fold_destroy_pool() is there are
    no outstanding handles to memory (so no active allocations), but it is
    possible for there to be outstanding work on either of the two wqs in
    the pool.

    Calling z3fold_deregister_migration() before the workqueues are drained
    means that there can be allocated pages referencing a freed inode,
    causing any thread in compaction to be able to trip over the bad pointer
    in PageMovable().

    Link: http://lkml.kernel.org/r/20190726224810.79660-2-henryburns@google.com
    Fixes: 1f862989b04a ("mm/z3fold.c: support page migration")
    Signed-off-by: Henry Burns
    Reviewed-by: Shakeel Butt
    Reviewed-by: Jonathan Adams
    Cc: Vitaly Vul
    Cc: Vitaly Wool
    Cc: David Howells
    Cc: Thomas Gleixner
    Cc: Al Viro
    Cc: Henry Burns
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henry Burns
     
  • The constraint from the zpool use of z3fold_destroy_pool() is there are
    no outstanding handles to memory (so no active allocations), but it is
    possible for there to be outstanding work on either of the two wqs in
    the pool.

    If there is work queued on pool->compact_workqueue when it is called,
    z3fold_destroy_pool() will do:

    z3fold_destroy_pool()
    destroy_workqueue(pool->release_wq)
    destroy_workqueue(pool->compact_wq)
    drain_workqueue(pool->compact_wq)
    do_compact_page(zhdr)
    kref_put(&zhdr->refcount)
    __release_z3fold_page(zhdr, ...)
    queue_work_on(pool->release_wq, &pool->work) *BOOM*

    So compact_wq needs to be destroyed before release_wq.

    Link: http://lkml.kernel.org/r/20190726224810.79660-1-henryburns@google.com
    Fixes: 5d03a6613957 ("mm/z3fold.c: use kref to prevent page free/compact race")
    Signed-off-by: Henry Burns
    Reviewed-by: Shakeel Butt
    Reviewed-by: Jonathan Adams
    Cc: Vitaly Vul
    Cc: Vitaly Wool
    Cc: David Howells
    Cc: Thomas Gleixner
    Cc: Al Viro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henry Burns
     

20 Jul, 2019

1 commit

  • Pull vfs mount updates from Al Viro:
    "The first part of mount updates.

    Convert filesystems to use the new mount API"

    * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    mnt_init(): call shmem_init() unconditionally
    constify ksys_mount() string arguments
    don't bother with registering rootfs
    init_rootfs(): don't bother with init_ramfs_fs()
    vfs: Convert smackfs to use the new mount API
    vfs: Convert selinuxfs to use the new mount API
    vfs: Convert securityfs to use the new mount API
    vfs: Convert apparmorfs to use the new mount API
    vfs: Convert openpromfs to use the new mount API
    vfs: Convert xenfs to use the new mount API
    vfs: Convert gadgetfs to use the new mount API
    vfs: Convert oprofilefs to use the new mount API
    vfs: Convert ibmasmfs to use the new mount API
    vfs: Convert qib_fs/ipathfs to use the new mount API
    vfs: Convert efivarfs to use the new mount API
    vfs: Convert configfs to use the new mount API
    vfs: Convert binfmt_misc to use the new mount API
    convenience helper: get_tree_single()
    convenience helper get_tree_nodev()
    vfs: Kill sget_userns()
    ...

    Linus Torvalds
     

17 Jul, 2019

4 commits

  • z3fold_page_migration() calls memcpy(new_zhdr, zhdr, PAGE_SIZE).
    However, zhdr contains fields that can't be directly coppied over (ex:
    list_head, a circular linked list). We only need to initialize the
    linked lists in new_zhdr, as z3fold_isolate_page() already ensures that
    these lists are empty

    Additionally it is possible that zhdr->work has been placed in a
    workqueue. In this case we shouldn't migrate the page, as zhdr->work
    references zhdr as opposed to new_zhdr.

    Link: http://lkml.kernel.org/r/20190716000520.230595-1-henryburns@google.com
    Fixes: 1f862989b04ade61d3 ("mm/z3fold.c: support page migration")
    Signed-off-by: Henry Burns
    Reviewed-by: Shakeel Butt
    Cc: Vitaly Vul
    Cc: Vitaly Wool
    Cc: Jonathan Adams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henry Burns
     
  • z3fold_page_migrate() will never succeed because it attempts to acquire
    a lock that has already been taken by migrate.c in __unmap_and_move().

    __unmap_and_move() migrate.c
    trylock_page(oldpage)
    move_to_new_page(oldpage_newpage)
    a_ops->migrate_page(oldpage, newpage)
    z3fold_page_migrate(oldpage, newpage)
    trylock_page(oldpage)

    Link: http://lkml.kernel.org/r/20190710213238.91835-1-henryburns@google.com
    Fixes: 1f862989b04a ("mm/z3fold.c: support page migration")
    Signed-off-by: Henry Burns
    Reviewed-by: Shakeel Butt
    Cc: Vitaly Wool
    Cc: Vitaly Vul
    Cc: Jonathan Adams
    Cc: Greg Kroah-Hartman
    Cc: Snild Dolkow
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henry Burns
     
  • One of the gfp flags used to show that a page is movable is
    __GFP_HIGHMEM. Currently z3fold_alloc() fails when __GFP_HIGHMEM is
    passed. Now that z3fold pages are movable, we allow __GFP_HIGHMEM. We
    strip the movability related flags from the call to kmem_cache_alloc()
    for our slots since it is a kernel allocation.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20190712222118.108192-1-henryburns@google.com
    Signed-off-by: Henry Burns
    Acked-by: Vitaly Wool
    Reviewed-by: Shakeel Butt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henry Burns
     
  • As reported by Henry Burns:

    Running z3fold stress testing with address sanitization showed zhdr->slots
    was being used after it was freed.

    z3fold_free(z3fold_pool, handle)
    free_handle(handle)
    kmem_cache_free(pool->c_handle, zhdr->slots)
    release_z3fold_page_locked_list(kref)
    __release_z3fold_page(zhdr, true)
    zhdr_to_pool(zhdr)
    slots_to_pool(zhdr->slots) *BOOM*

    To fix this, add pointer to the pool back to z3fold_header and modify
    zhdr_to_pool to return zhdr->pool.

    Link: http://lkml.kernel.org/r/20190708134808.e89f3bfadd9f6ffd7eff9ba9@gmail.com
    Fixes: 7c2b8baa61fe ("mm/z3fold.c: add structure for buddy handles")
    Signed-off-by: Vitaly Wool
    Reported-by: Henry Burns
    Reviewed-by: Shakeel Butt
    Cc: Jonathan Adams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

13 Jul, 2019

1 commit

  • Following zsmalloc.c's example we call trylock_page() and unlock_page().
    Also make z3fold_page_migrate() assert that newpage is passed in locked,
    as per the documentation.

    [akpm@linux-foundation.org: fix trylock_page return value test, per Shakeel]
    Link: http://lkml.kernel.org/r/20190702005122.41036-1-henryburns@google.com
    Link: http://lkml.kernel.org/r/20190702233538.52793-1-henryburns@google.com
    Signed-off-by: Henry Burns
    Suggested-by: Vitaly Wool
    Acked-by: Vitaly Wool
    Acked-by: David Rientjes
    Reviewed-by: Shakeel Butt
    Cc: Vitaly Vul
    Cc: Mike Rapoport
    Cc: Xidong Wang
    Cc: Jonathan Adams
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henry Burns
     

02 Jun, 2019

1 commit

  • kmem_cache_alloc() may be called from z3fold_alloc() in atomic context, so
    we need to pass correct gfp flags to avoid "scheduling while atomic" bug.

    Link: http://lkml.kernel.org/r/20190523153245.119dfeed55927e8755250ddd@gmail.com
    Fixes: 7c2b8baa61fe5 ("mm/z3fold.c: add structure for buddy handles")
    Signed-off-by: Vitaly Wool
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

26 May, 2019

2 commits

  • Convert the zsfold filesystem to the new internal mount API as the old one
    will be obsoleted and removed. This allows greater flexibility in
    communication of mount parameters between userspace, the VFS and the
    filesystem.

    See Documentation/filesystems/mount_api.txt for more information.

    Signed-off-by: David Howells

    David Howells
     
  • Once upon a time we used to set ->d_name of e.g. pipefs root
    so that d_path() on pipes would work. These days it's
    completely pointless - dentries of pipes are not even connected
    to pipefs root. However, mount_pseudo() had set the root
    dentry name (passed as the second argument) and callers
    kept inventing names to pass to it. Including those that
    didn't *have* any non-root dentries to start with...

    All of that had been pointless for about 8 years now; it's
    time to get rid of that cargo-culting...

    Signed-off-by: Al Viro

    Al Viro
     

21 May, 2019

2 commits


15 May, 2019

4 commits

  • Now that we are not using page address in handles directly, we can make
    z3fold pages movable to decrease the memory fragmentation z3fold may
    create over time.

    This patch starts advertising non-headless z3fold pages as movable and
    uses the existing kernel infrastructure to implement moving of such pages
    per memory management subsystem's request. It thus implements 3 required
    callbacks for page migration:

    * isolation callback: z3fold_page_isolate(): try to isolate the page by
    removing it from all lists. Pages scheduled for some activity and
    mapped pages will not be isolated. Return true if isolation was
    successful or false otherwise

    * migration callback: z3fold_page_migrate(): re-check critical
    conditions and migrate page contents to the new page provided by the
    memory subsystem. Returns 0 on success or negative error code otherwise

    * putback callback: z3fold_page_putback(): put back the page if
    z3fold_page_migrate() for it failed permanently (i. e. not with
    -EAGAIN code).

    [lkp@intel.com: z3fold_page_isolate() can be static]
    Link: http://lkml.kernel.org/r/20190419130924.GA161478@ivb42
    Link: http://lkml.kernel.org/r/20190417103922.31253da5c366c4ebe0419cfc@gmail.com
    Signed-off-by: Vitaly Wool
    Signed-off-by: kbuild test robot
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Dan Streetman
    Cc: Krzysztof Kozlowski
    Cc: Oleksiy Avramchenko
    Cc: Uladzislau Rezki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • For z3fold to be able to move its pages per request of the memory
    subsystem, it should not use direct object addresses in handles. Instead,
    it will create abstract handles (3 per page) which will contain pointers
    to z3fold objects. Thus, it will be possible to change these pointers
    when z3fold page is moved.

    Link: http://lkml.kernel.org/r/20190417103826.484eaf18c1294d682769880f@gmail.com
    Signed-off-by: Vitaly Wool
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Dan Streetman
    Cc: Krzysztof Kozlowski
    Cc: Oleksiy Avramchenko
    Cc: Uladzislau Rezki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • The current z3fold implementation only searches this CPU's page lists for
    a fitting page to put a new object into. This patch adds quick search for
    very well fitting pages (i. e. those having exactly the required number
    of free space) on other CPUs too, before allocating a new page for that
    object.

    Link: http://lkml.kernel.org/r/20190417103733.72ae81abe1552397c95a008e@gmail.com
    Signed-off-by: Vitaly Wool
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Dan Streetman
    Cc: Krzysztof Kozlowski
    Cc: Oleksiy Avramchenko
    Cc: Uladzislau Rezki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • Patch series "z3fold: support page migration", v2.

    This patchset implements page migration support and slightly better buddy
    search. To implement page migration support, z3fold has to move away from
    the current scheme of handle encoding. i. e. stop encoding page address
    in handles. Instead, a small per-page structure is created which will
    contain actual addresses for z3fold objects, while pointers to fields of
    that structure will be used as handles.

    Thus, it will be possible to change the underlying addresses to reflect
    page migration.

    To support migration itself, 3 callbacks will be implemented:

    1: isolation callback: z3fold_page_isolate(): try to isolate the page
    by removing it from all lists. Pages scheduled for some activity and
    mapped pages will not be isolated. Return true if isolation was
    successful or false otherwise

    2: migration callback: z3fold_page_migrate(): re-check critical
    conditions and migrate page contents to the new page provided by the
    system. Returns 0 on success or negative error code otherwise

    3: putback callback: z3fold_page_putback(): put back the page if
    z3fold_page_migrate() for it failed permanently (i. e. not with
    -EAGAIN code).

    To make sure an isolated page doesn't get freed, its kref is incremented
    in z3fold_page_isolate() and decremented during post-migration compaction,
    if migration was successful, or by z3fold_page_putback() in the other
    case.

    Since the new handle encoding scheme implies slight memory consumption
    increase, better buddy search (which decreases memory consumption) is
    included in this patchset.

    This patch (of 4):

    Introduce a separate helper function for object allocation, as well as 2
    smaller helpers to add a buddy to the list and to get a pointer to the
    pool from the z3fold header. No functional changes here.

    Link: http://lkml.kernel.org/r/20190417103633.a4bb770b5bf0fb7e43ce1666@gmail.com
    Signed-off-by: Vitaly Wool
    Cc: Dan Streetman
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Krzysztof Kozlowski
    Cc: Oleksiy Avramchenko
    Cc: Uladzislau Rezki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

19 Nov, 2018

1 commit

  • Reclaim and free can race on an object which is basically fine but in
    order for reclaim to be able to map "freed" object we need to encode
    object length in the handle. handle_to_chunks() is then introduced to
    extract object length from a handle and use it during mapping.

    Moreover, to avoid racing on a z3fold "headless" page release, we should
    not try to free that page in z3fold_free() if the reclaim bit is set.
    Also, in the unlikely case of trying to reclaim a page being freed, we
    should not proceed with that page.

    While at it, fix the page accounting in reclaim function.

    This patch supersedes "[PATCH] z3fold: fix reclaim lock-ups".

    Link: http://lkml.kernel.org/r/20181105162225.74e8837d03583a9b707cf559@gmail.com
    Signed-off-by: Vitaly Wool
    Signed-off-by: Jongseok Kim
    Reported-by-by: Jongseok Kim
    Reviewed-by: Snild Dolkow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

12 May, 2018

1 commit

  • Do not try to optimize in-page object layout while the page is under
    reclaim. This fixes lock-ups on reclaim and improves reclaim
    performance at the same time.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20180430125800.444cae9706489f412ad12621@gmail.com
    Signed-off-by: Vitaly Wool
    Reported-by: Guenter Roeck
    Tested-by: Guenter Roeck
    Cc:
    Cc: Matthew Wilcox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

12 Apr, 2018

2 commits

  • We have a perfectly good macro to determine whether the gfp flags allow
    you to sleep or not; use it instead of trying to infer it.

    Link: http://lkml.kernel.org/r/20180408062206.GC16007@bombadil.infradead.org
    Signed-off-by: Matthew Wilcox
    Reviewed-by: Andrew Morton
    Cc: Vitaly Wool
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • In z3fold_create_pool(), the memory allocated by __alloc_percpu() is not
    released on the error path that pool->compact_wq , which holds the
    return value of create_singlethread_workqueue(), is NULL. This will
    result in a memory leak bug.

    [akpm@linux-foundation.org: fix oops on kzalloc() failure, check __alloc_percpu() retval]
    Link: http://lkml.kernel.org/r/1522803111-29209-1-git-send-email-wangxidong_97@163.com
    Signed-off-by: Xidong Wang
    Reviewed-by: Andrew Morton
    Cc: Vitaly Wool
    Cc: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xidong Wang
     

06 Apr, 2018

1 commit

  • Currently if z3fold couldn't find an unbuddied page it would first try
    to pull a page off the stale list. The problem with this approach is
    that we can't 100% guarantee that the page is not processed by the
    workqueue thread at the same time unless we run cancel_work_sync() on
    it, which we can't do if we're in an atomic context. So let's just
    limit stale list usage to non-atomic contexts only.

    Link: http://lkml.kernel.org/r/47ab51e7-e9c1-d30e-ab17-f734dbc3abce@gmail.com
    Signed-off-by: Vitaly Vul
    Reviewed-by: Andrew Morton
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

07 Feb, 2018

1 commit

  • There are several places where parameter descriptions do no match the
    actual code. Fix it.

    Link: http://lkml.kernel.org/r/1516700871-22279-3-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Jonathan Corbet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

18 Nov, 2017

1 commit

  • There is a race in the current z3fold implementation between
    do_compact() called in a work queue context and the page release
    procedure when page's kref goes to 0.

    do_compact() may be waiting for page lock, which is released by
    release_z3fold_page_locked right before putting the page onto the
    "stale" list, and then the page may be freed as do_compact() modifies
    its contents.

    The mechanism currently implemented to handle that (checking the
    PAGE_STALE flag) is not reliable enough. Instead, we'll use page's kref
    counter to guarantee that the page is not released if its compaction is
    scheduled. It then becomes compaction function's responsibility to
    decrease the counter and quit immediately if the page was actually
    freed.

    Link: http://lkml.kernel.org/r/20171117092032.00ea56f42affbed19f4fcc6c@gmail.com
    Signed-off-by: Vitaly Wool
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

04 Oct, 2017

2 commits

  • Fix the situation when clear_bit() is called for page->private before
    the page pointer is actually assigned. While at it, remove work_busy()
    check because it is costly and does not give 100% guarantee anyway.

    Signed-off-by: Vitaly Wool
    Cc: Dan Streetman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • It is possible that on a (partially) unsuccessful page reclaim,
    kref_put() called in z3fold_reclaim_page() does not yield page release,
    but the page is released shortly afterwards by another thread. Then
    z3fold_reclaim_page() would try to list_add() that (released) page again
    which is obviously a bug.

    To avoid that, spin_lock() has to be taken earlier, before the
    kref_put() call mentioned earlier.

    Link: http://lkml.kernel.org/r/20170913162937.bfff21c7d12b12a5f47639fd@gmail.com
    Signed-off-by: Vitaly Wool
    Cc: Dan Streetman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

07 Sep, 2017

1 commit

  • It's been noted that z3fold doesn't scale well when it's run in a large
    number of threads on many cores, which can be easily reproduced with fio
    'randrw' test with --numjobs=32. E.g. the result for 1 cluster (4 cores)
    is:

    Run status group 0 (all jobs):
    READ: io=244785MB, aggrb=496883KB/s, minb=15527KB/s, ...
    WRITE: io=246735MB, aggrb=500841KB/s, minb=15651KB/s, ...

    While for 8 cores (2 clusters) the result is:

    Run status group 0 (all jobs):
    READ: io=244785MB, aggrb=265942KB/s, minb=8310KB/s, ...
    WRITE: io=246735MB, aggrb=268060KB/s, minb=8376KB/s, ...

    The bottleneck here is the pool lock which many threads become waiting
    upon. To reduce that spin lock contention, z3fold can operate only on
    the lists local to the current CPU whenever possible. Due to the nature
    of z3fold unbuddied list handling (it only takes the first entry off the
    list on a hot path), if the z3fold pool is big enough and balanced well
    enough, limiting search to only local unbuddied list doesn't lead to a
    significant compression ratio degrade (2.57x vs 2.65x in our
    measurements).

    This patch also introduces two worker threads: one for async in-page
    object layout optimization and one for releasing freed pages. This is
    done to speed up z3fold_free() which is often on a hot path.

    The fio results for 8-core case are now the following:

    Run status group 0 (all jobs):
    READ: io=244785MB, aggrb=1568.3MB/s, minb=50182KB/s, ...
    WRITE: io=246735MB, aggrb=1580.8MB/s, minb=50582KB/s, ...

    So we're in for almost 6x performance increase.

    Link: http://lkml.kernel.org/r/20170806181443.f9b65018f8bde25ef990f9e8@gmail.com
    Signed-off-by: Vitaly Wool
    Cc: Dan Streetman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool