08 Oct, 2019

1 commit

  • There's a really hard to reproduce race in z3fold between z3fold_free()
    and z3fold_reclaim_page(). z3fold_reclaim_page() can claim the page
    after z3fold_free() has checked if the page was claimed and
    z3fold_free() will then schedule this page for compaction which may in
    turn lead to random page faults (since that page would have been
    reclaimed by then).

    Fix that by claiming page in the beginning of z3fold_free() and not
    forgetting to clear the claim in the end.

    [vitalywool@gmail.com: v2]
    Link: http://lkml.kernel.org/r/20190928113456.152742cf@bigdell
    Link: http://lkml.kernel.org/r/20190926104844.4f0c6efa1366b8f5741eaba9@gmail.com
    Signed-off-by: Vitaly Wool
    Reported-by: Markus Linnala
    Cc: Dan Streetman
    Cc: Vlastimil Babka
    Cc: Henry Burns
    Cc: Shakeel Butt
    Cc: Markus Linnala
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

25 Sep, 2019

3 commits

  • Currently there is a leak in init_z3fold_page() -- it allocates handles
    from kmem cache even for headless pages, but then they are never used and
    never freed, so eventually kmem cache may get exhausted. This patch
    provides a fix for that.

    Link: http://lkml.kernel.org/r/20190917185352.44cf285d3ebd9e64548de5de@gmail.com
    Signed-off-by: Vitaly Wool
    Reported-by: Markus Linnala
    Tested-by: Markus Linnala
    Cc: Dan Streetman
    Cc: Henry Burns
    Cc: Shakeel Butt
    Cc: Vlastimil Babka
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • z3fold_page_reclaim()'s retry mechanism is broken: on a second iteration
    it will have zhdr from the first one so that zhdr is no longer in line
    with struct page. That leads to crashes when the system is stressed.

    Fix that by moving zhdr assignment up.

    While at it, protect against using already freed handles by using own
    local slots structure in z3fold_page_reclaim().

    Link: http://lkml.kernel.org/r/20190908162919.830388dc7404d1e2c80f4095@gmail.com
    Signed-off-by: Vitaly Wool
    Reported-by: Markus Linnala
    Reported-by: Chris Murphy
    Reported-by: Agustin Dall'Alba
    Cc: "Maciej S. Szmigiero"
    Cc: Shakeel Butt
    Cc: Henry Burns
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • With the original commit applied, z3fold_zpool_destroy() may get blocked
    on wait_event() for indefinite time. Revert this commit for the time
    being to get rid of this problem since the issue the original commit
    addresses is less severe.

    Link: http://lkml.kernel.org/r/20190910123142.7a9c8d2de4d0acbc0977c602@gmail.com
    Fixes: d776aaa9895eb6eb77 ("mm/z3fold.c: fix race between migration and destruction")
    Reported-by: Agustín Dall'Alba
    Signed-off-by: Vitaly Wool
    Cc: Vlastimil Babka
    Cc: Vitaly Wool
    Cc: Shakeel Butt
    Cc: Jonathan Adams
    Cc: Henry Burns
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

31 Aug, 2019

1 commit

  • Fix lock/unlock imbalance by unlocking *zhdr* before return.

    Addresses Coverity ID 1452811 ("Missing unlock")

    Link: http://lkml.kernel.org/r/20190826030634.GA4379@embeddedor
    Fixes: d776aaa9895e ("mm/z3fold.c: fix race between migration and destruction")
    Signed-off-by: Gustavo A. R. Silva
    Reviewed-by: Andrew Morton
    Cc: Henry Burns
    Cc: Vitaly Wool
    Cc: Shakeel Butt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gustavo A. R. Silva
     

25 Aug, 2019

1 commit

  • In z3fold_destroy_pool() we call destroy_workqueue(&pool->compact_wq).
    However, we have no guarantee that migration isn't happening in the
    background at that time.

    Migration directly calls queue_work_on(pool->compact_wq), if destruction
    wins that race we are using a destroyed workqueue.

    Link: http://lkml.kernel.org/r/20190809213828.202833-1-henryburns@google.com
    Signed-off-by: Henry Burns
    Cc: Vitaly Wool
    Cc: Shakeel Butt
    Cc: Jonathan Adams
    Cc: Henry Burns
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henry Burns
     

14 Aug, 2019

2 commits

  • The constraint from the zpool use of z3fold_destroy_pool() is there are
    no outstanding handles to memory (so no active allocations), but it is
    possible for there to be outstanding work on either of the two wqs in
    the pool.

    Calling z3fold_deregister_migration() before the workqueues are drained
    means that there can be allocated pages referencing a freed inode,
    causing any thread in compaction to be able to trip over the bad pointer
    in PageMovable().

    Link: http://lkml.kernel.org/r/20190726224810.79660-2-henryburns@google.com
    Fixes: 1f862989b04a ("mm/z3fold.c: support page migration")
    Signed-off-by: Henry Burns
    Reviewed-by: Shakeel Butt
    Reviewed-by: Jonathan Adams
    Cc: Vitaly Vul
    Cc: Vitaly Wool
    Cc: David Howells
    Cc: Thomas Gleixner
    Cc: Al Viro
    Cc: Henry Burns
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henry Burns
     
  • The constraint from the zpool use of z3fold_destroy_pool() is there are
    no outstanding handles to memory (so no active allocations), but it is
    possible for there to be outstanding work on either of the two wqs in
    the pool.

    If there is work queued on pool->compact_workqueue when it is called,
    z3fold_destroy_pool() will do:

    z3fold_destroy_pool()
    destroy_workqueue(pool->release_wq)
    destroy_workqueue(pool->compact_wq)
    drain_workqueue(pool->compact_wq)
    do_compact_page(zhdr)
    kref_put(&zhdr->refcount)
    __release_z3fold_page(zhdr, ...)
    queue_work_on(pool->release_wq, &pool->work) *BOOM*

    So compact_wq needs to be destroyed before release_wq.

    Link: http://lkml.kernel.org/r/20190726224810.79660-1-henryburns@google.com
    Fixes: 5d03a6613957 ("mm/z3fold.c: use kref to prevent page free/compact race")
    Signed-off-by: Henry Burns
    Reviewed-by: Shakeel Butt
    Reviewed-by: Jonathan Adams
    Cc: Vitaly Vul
    Cc: Vitaly Wool
    Cc: David Howells
    Cc: Thomas Gleixner
    Cc: Al Viro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henry Burns
     

20 Jul, 2019

1 commit

  • Pull vfs mount updates from Al Viro:
    "The first part of mount updates.

    Convert filesystems to use the new mount API"

    * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    mnt_init(): call shmem_init() unconditionally
    constify ksys_mount() string arguments
    don't bother with registering rootfs
    init_rootfs(): don't bother with init_ramfs_fs()
    vfs: Convert smackfs to use the new mount API
    vfs: Convert selinuxfs to use the new mount API
    vfs: Convert securityfs to use the new mount API
    vfs: Convert apparmorfs to use the new mount API
    vfs: Convert openpromfs to use the new mount API
    vfs: Convert xenfs to use the new mount API
    vfs: Convert gadgetfs to use the new mount API
    vfs: Convert oprofilefs to use the new mount API
    vfs: Convert ibmasmfs to use the new mount API
    vfs: Convert qib_fs/ipathfs to use the new mount API
    vfs: Convert efivarfs to use the new mount API
    vfs: Convert configfs to use the new mount API
    vfs: Convert binfmt_misc to use the new mount API
    convenience helper: get_tree_single()
    convenience helper get_tree_nodev()
    vfs: Kill sget_userns()
    ...

    Linus Torvalds
     

17 Jul, 2019

4 commits

  • z3fold_page_migration() calls memcpy(new_zhdr, zhdr, PAGE_SIZE).
    However, zhdr contains fields that can't be directly coppied over (ex:
    list_head, a circular linked list). We only need to initialize the
    linked lists in new_zhdr, as z3fold_isolate_page() already ensures that
    these lists are empty

    Additionally it is possible that zhdr->work has been placed in a
    workqueue. In this case we shouldn't migrate the page, as zhdr->work
    references zhdr as opposed to new_zhdr.

    Link: http://lkml.kernel.org/r/20190716000520.230595-1-henryburns@google.com
    Fixes: 1f862989b04ade61d3 ("mm/z3fold.c: support page migration")
    Signed-off-by: Henry Burns
    Reviewed-by: Shakeel Butt
    Cc: Vitaly Vul
    Cc: Vitaly Wool
    Cc: Jonathan Adams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henry Burns
     
  • z3fold_page_migrate() will never succeed because it attempts to acquire
    a lock that has already been taken by migrate.c in __unmap_and_move().

    __unmap_and_move() migrate.c
    trylock_page(oldpage)
    move_to_new_page(oldpage_newpage)
    a_ops->migrate_page(oldpage, newpage)
    z3fold_page_migrate(oldpage, newpage)
    trylock_page(oldpage)

    Link: http://lkml.kernel.org/r/20190710213238.91835-1-henryburns@google.com
    Fixes: 1f862989b04a ("mm/z3fold.c: support page migration")
    Signed-off-by: Henry Burns
    Reviewed-by: Shakeel Butt
    Cc: Vitaly Wool
    Cc: Vitaly Vul
    Cc: Jonathan Adams
    Cc: Greg Kroah-Hartman
    Cc: Snild Dolkow
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henry Burns
     
  • One of the gfp flags used to show that a page is movable is
    __GFP_HIGHMEM. Currently z3fold_alloc() fails when __GFP_HIGHMEM is
    passed. Now that z3fold pages are movable, we allow __GFP_HIGHMEM. We
    strip the movability related flags from the call to kmem_cache_alloc()
    for our slots since it is a kernel allocation.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20190712222118.108192-1-henryburns@google.com
    Signed-off-by: Henry Burns
    Acked-by: Vitaly Wool
    Reviewed-by: Shakeel Butt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henry Burns
     
  • As reported by Henry Burns:

    Running z3fold stress testing with address sanitization showed zhdr->slots
    was being used after it was freed.

    z3fold_free(z3fold_pool, handle)
    free_handle(handle)
    kmem_cache_free(pool->c_handle, zhdr->slots)
    release_z3fold_page_locked_list(kref)
    __release_z3fold_page(zhdr, true)
    zhdr_to_pool(zhdr)
    slots_to_pool(zhdr->slots) *BOOM*

    To fix this, add pointer to the pool back to z3fold_header and modify
    zhdr_to_pool to return zhdr->pool.

    Link: http://lkml.kernel.org/r/20190708134808.e89f3bfadd9f6ffd7eff9ba9@gmail.com
    Fixes: 7c2b8baa61fe ("mm/z3fold.c: add structure for buddy handles")
    Signed-off-by: Vitaly Wool
    Reported-by: Henry Burns
    Reviewed-by: Shakeel Butt
    Cc: Jonathan Adams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

13 Jul, 2019

1 commit

  • Following zsmalloc.c's example we call trylock_page() and unlock_page().
    Also make z3fold_page_migrate() assert that newpage is passed in locked,
    as per the documentation.

    [akpm@linux-foundation.org: fix trylock_page return value test, per Shakeel]
    Link: http://lkml.kernel.org/r/20190702005122.41036-1-henryburns@google.com
    Link: http://lkml.kernel.org/r/20190702233538.52793-1-henryburns@google.com
    Signed-off-by: Henry Burns
    Suggested-by: Vitaly Wool
    Acked-by: Vitaly Wool
    Acked-by: David Rientjes
    Reviewed-by: Shakeel Butt
    Cc: Vitaly Vul
    Cc: Mike Rapoport
    Cc: Xidong Wang
    Cc: Jonathan Adams
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henry Burns
     

02 Jun, 2019

1 commit

  • kmem_cache_alloc() may be called from z3fold_alloc() in atomic context, so
    we need to pass correct gfp flags to avoid "scheduling while atomic" bug.

    Link: http://lkml.kernel.org/r/20190523153245.119dfeed55927e8755250ddd@gmail.com
    Fixes: 7c2b8baa61fe5 ("mm/z3fold.c: add structure for buddy handles")
    Signed-off-by: Vitaly Wool
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

26 May, 2019

2 commits

  • Convert the zsfold filesystem to the new internal mount API as the old one
    will be obsoleted and removed. This allows greater flexibility in
    communication of mount parameters between userspace, the VFS and the
    filesystem.

    See Documentation/filesystems/mount_api.txt for more information.

    Signed-off-by: David Howells

    David Howells
     
  • Once upon a time we used to set ->d_name of e.g. pipefs root
    so that d_path() on pipes would work. These days it's
    completely pointless - dentries of pipes are not even connected
    to pipefs root. However, mount_pseudo() had set the root
    dentry name (passed as the second argument) and callers
    kept inventing names to pass to it. Including those that
    didn't *have* any non-root dentries to start with...

    All of that had been pointless for about 8 years now; it's
    time to get rid of that cargo-culting...

    Signed-off-by: Al Viro

    Al Viro
     

21 May, 2019

2 commits


15 May, 2019

4 commits

  • Now that we are not using page address in handles directly, we can make
    z3fold pages movable to decrease the memory fragmentation z3fold may
    create over time.

    This patch starts advertising non-headless z3fold pages as movable and
    uses the existing kernel infrastructure to implement moving of such pages
    per memory management subsystem's request. It thus implements 3 required
    callbacks for page migration:

    * isolation callback: z3fold_page_isolate(): try to isolate the page by
    removing it from all lists. Pages scheduled for some activity and
    mapped pages will not be isolated. Return true if isolation was
    successful or false otherwise

    * migration callback: z3fold_page_migrate(): re-check critical
    conditions and migrate page contents to the new page provided by the
    memory subsystem. Returns 0 on success or negative error code otherwise

    * putback callback: z3fold_page_putback(): put back the page if
    z3fold_page_migrate() for it failed permanently (i. e. not with
    -EAGAIN code).

    [lkp@intel.com: z3fold_page_isolate() can be static]
    Link: http://lkml.kernel.org/r/20190419130924.GA161478@ivb42
    Link: http://lkml.kernel.org/r/20190417103922.31253da5c366c4ebe0419cfc@gmail.com
    Signed-off-by: Vitaly Wool
    Signed-off-by: kbuild test robot
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Dan Streetman
    Cc: Krzysztof Kozlowski
    Cc: Oleksiy Avramchenko
    Cc: Uladzislau Rezki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • For z3fold to be able to move its pages per request of the memory
    subsystem, it should not use direct object addresses in handles. Instead,
    it will create abstract handles (3 per page) which will contain pointers
    to z3fold objects. Thus, it will be possible to change these pointers
    when z3fold page is moved.

    Link: http://lkml.kernel.org/r/20190417103826.484eaf18c1294d682769880f@gmail.com
    Signed-off-by: Vitaly Wool
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Dan Streetman
    Cc: Krzysztof Kozlowski
    Cc: Oleksiy Avramchenko
    Cc: Uladzislau Rezki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • The current z3fold implementation only searches this CPU's page lists for
    a fitting page to put a new object into. This patch adds quick search for
    very well fitting pages (i. e. those having exactly the required number
    of free space) on other CPUs too, before allocating a new page for that
    object.

    Link: http://lkml.kernel.org/r/20190417103733.72ae81abe1552397c95a008e@gmail.com
    Signed-off-by: Vitaly Wool
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Dan Streetman
    Cc: Krzysztof Kozlowski
    Cc: Oleksiy Avramchenko
    Cc: Uladzislau Rezki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • Patch series "z3fold: support page migration", v2.

    This patchset implements page migration support and slightly better buddy
    search. To implement page migration support, z3fold has to move away from
    the current scheme of handle encoding. i. e. stop encoding page address
    in handles. Instead, a small per-page structure is created which will
    contain actual addresses for z3fold objects, while pointers to fields of
    that structure will be used as handles.

    Thus, it will be possible to change the underlying addresses to reflect
    page migration.

    To support migration itself, 3 callbacks will be implemented:

    1: isolation callback: z3fold_page_isolate(): try to isolate the page
    by removing it from all lists. Pages scheduled for some activity and
    mapped pages will not be isolated. Return true if isolation was
    successful or false otherwise

    2: migration callback: z3fold_page_migrate(): re-check critical
    conditions and migrate page contents to the new page provided by the
    system. Returns 0 on success or negative error code otherwise

    3: putback callback: z3fold_page_putback(): put back the page if
    z3fold_page_migrate() for it failed permanently (i. e. not with
    -EAGAIN code).

    To make sure an isolated page doesn't get freed, its kref is incremented
    in z3fold_page_isolate() and decremented during post-migration compaction,
    if migration was successful, or by z3fold_page_putback() in the other
    case.

    Since the new handle encoding scheme implies slight memory consumption
    increase, better buddy search (which decreases memory consumption) is
    included in this patchset.

    This patch (of 4):

    Introduce a separate helper function for object allocation, as well as 2
    smaller helpers to add a buddy to the list and to get a pointer to the
    pool from the z3fold header. No functional changes here.

    Link: http://lkml.kernel.org/r/20190417103633.a4bb770b5bf0fb7e43ce1666@gmail.com
    Signed-off-by: Vitaly Wool
    Cc: Dan Streetman
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Krzysztof Kozlowski
    Cc: Oleksiy Avramchenko
    Cc: Uladzislau Rezki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

19 Nov, 2018

1 commit

  • Reclaim and free can race on an object which is basically fine but in
    order for reclaim to be able to map "freed" object we need to encode
    object length in the handle. handle_to_chunks() is then introduced to
    extract object length from a handle and use it during mapping.

    Moreover, to avoid racing on a z3fold "headless" page release, we should
    not try to free that page in z3fold_free() if the reclaim bit is set.
    Also, in the unlikely case of trying to reclaim a page being freed, we
    should not proceed with that page.

    While at it, fix the page accounting in reclaim function.

    This patch supersedes "[PATCH] z3fold: fix reclaim lock-ups".

    Link: http://lkml.kernel.org/r/20181105162225.74e8837d03583a9b707cf559@gmail.com
    Signed-off-by: Vitaly Wool
    Signed-off-by: Jongseok Kim
    Reported-by-by: Jongseok Kim
    Reviewed-by: Snild Dolkow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

12 May, 2018

1 commit

  • Do not try to optimize in-page object layout while the page is under
    reclaim. This fixes lock-ups on reclaim and improves reclaim
    performance at the same time.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20180430125800.444cae9706489f412ad12621@gmail.com
    Signed-off-by: Vitaly Wool
    Reported-by: Guenter Roeck
    Tested-by: Guenter Roeck
    Cc:
    Cc: Matthew Wilcox
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

12 Apr, 2018

2 commits

  • We have a perfectly good macro to determine whether the gfp flags allow
    you to sleep or not; use it instead of trying to infer it.

    Link: http://lkml.kernel.org/r/20180408062206.GC16007@bombadil.infradead.org
    Signed-off-by: Matthew Wilcox
    Reviewed-by: Andrew Morton
    Cc: Vitaly Wool
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • In z3fold_create_pool(), the memory allocated by __alloc_percpu() is not
    released on the error path that pool->compact_wq , which holds the
    return value of create_singlethread_workqueue(), is NULL. This will
    result in a memory leak bug.

    [akpm@linux-foundation.org: fix oops on kzalloc() failure, check __alloc_percpu() retval]
    Link: http://lkml.kernel.org/r/1522803111-29209-1-git-send-email-wangxidong_97@163.com
    Signed-off-by: Xidong Wang
    Reviewed-by: Andrew Morton
    Cc: Vitaly Wool
    Cc: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xidong Wang
     

06 Apr, 2018

1 commit

  • Currently if z3fold couldn't find an unbuddied page it would first try
    to pull a page off the stale list. The problem with this approach is
    that we can't 100% guarantee that the page is not processed by the
    workqueue thread at the same time unless we run cancel_work_sync() on
    it, which we can't do if we're in an atomic context. So let's just
    limit stale list usage to non-atomic contexts only.

    Link: http://lkml.kernel.org/r/47ab51e7-e9c1-d30e-ab17-f734dbc3abce@gmail.com
    Signed-off-by: Vitaly Vul
    Reviewed-by: Andrew Morton
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

07 Feb, 2018

1 commit

  • There are several places where parameter descriptions do no match the
    actual code. Fix it.

    Link: http://lkml.kernel.org/r/1516700871-22279-3-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Jonathan Corbet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

18 Nov, 2017

1 commit

  • There is a race in the current z3fold implementation between
    do_compact() called in a work queue context and the page release
    procedure when page's kref goes to 0.

    do_compact() may be waiting for page lock, which is released by
    release_z3fold_page_locked right before putting the page onto the
    "stale" list, and then the page may be freed as do_compact() modifies
    its contents.

    The mechanism currently implemented to handle that (checking the
    PAGE_STALE flag) is not reliable enough. Instead, we'll use page's kref
    counter to guarantee that the page is not released if its compaction is
    scheduled. It then becomes compaction function's responsibility to
    decrease the counter and quit immediately if the page was actually
    freed.

    Link: http://lkml.kernel.org/r/20171117092032.00ea56f42affbed19f4fcc6c@gmail.com
    Signed-off-by: Vitaly Wool
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

04 Oct, 2017

2 commits

  • Fix the situation when clear_bit() is called for page->private before
    the page pointer is actually assigned. While at it, remove work_busy()
    check because it is costly and does not give 100% guarantee anyway.

    Signed-off-by: Vitaly Wool
    Cc: Dan Streetman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • It is possible that on a (partially) unsuccessful page reclaim,
    kref_put() called in z3fold_reclaim_page() does not yield page release,
    but the page is released shortly afterwards by another thread. Then
    z3fold_reclaim_page() would try to list_add() that (released) page again
    which is obviously a bug.

    To avoid that, spin_lock() has to be taken earlier, before the
    kref_put() call mentioned earlier.

    Link: http://lkml.kernel.org/r/20170913162937.bfff21c7d12b12a5f47639fd@gmail.com
    Signed-off-by: Vitaly Wool
    Cc: Dan Streetman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

07 Sep, 2017

1 commit

  • It's been noted that z3fold doesn't scale well when it's run in a large
    number of threads on many cores, which can be easily reproduced with fio
    'randrw' test with --numjobs=32. E.g. the result for 1 cluster (4 cores)
    is:

    Run status group 0 (all jobs):
    READ: io=244785MB, aggrb=496883KB/s, minb=15527KB/s, ...
    WRITE: io=246735MB, aggrb=500841KB/s, minb=15651KB/s, ...

    While for 8 cores (2 clusters) the result is:

    Run status group 0 (all jobs):
    READ: io=244785MB, aggrb=265942KB/s, minb=8310KB/s, ...
    WRITE: io=246735MB, aggrb=268060KB/s, minb=8376KB/s, ...

    The bottleneck here is the pool lock which many threads become waiting
    upon. To reduce that spin lock contention, z3fold can operate only on
    the lists local to the current CPU whenever possible. Due to the nature
    of z3fold unbuddied list handling (it only takes the first entry off the
    list on a hot path), if the z3fold pool is big enough and balanced well
    enough, limiting search to only local unbuddied list doesn't lead to a
    significant compression ratio degrade (2.57x vs 2.65x in our
    measurements).

    This patch also introduces two worker threads: one for async in-page
    object layout optimization and one for releasing freed pages. This is
    done to speed up z3fold_free() which is often on a hot path.

    The fio results for 8-core case are now the following:

    Run status group 0 (all jobs):
    READ: io=244785MB, aggrb=1568.3MB/s, minb=50182KB/s, ...
    WRITE: io=246735MB, aggrb=1580.8MB/s, minb=50582KB/s, ...

    So we're in for almost 6x performance increase.

    Link: http://lkml.kernel.org/r/20170806181443.f9b65018f8bde25ef990f9e8@gmail.com
    Signed-off-by: Vitaly Wool
    Cc: Dan Streetman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

14 Apr, 2017

1 commit

  • Stress testing of the current z3fold implementation on a 8-core system
    revealed it was possible that a z3fold page deleted from its unbuddied
    list in z3fold_alloc() would be put on another unbuddied list by
    z3fold_free() while z3fold_alloc() is still processing it. This has
    been introduced with commit 5a27aa822 ("z3fold: add kref refcounting")
    due to the removal of special handling of a z3fold page not on any list
    in z3fold_free().

    To fix this, the z3fold page lock should be taken in z3fold_alloc()
    before the pool lock is released. To avoid deadlocking, we just try to
    lock the page as soon as we get a hold of it, and if trylock fails, we
    drop this page and take the next one.

    Signed-off-by: Vitaly Wool
    Cc: Dan Streetman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

17 Mar, 2017

1 commit

  • Commmit 5a27aa822029 ("z3fold: add kref refcounting") introduced a bug
    in z3fold_reclaim_page() with function exit that may leave pool->lock
    spinlock held. Here comes the trivial fix.

    Fixes: 5a27aa822029 ("z3fold: add kref refcounting")
    Link: http://lkml.kernel.org/r/20170311222239.7b83d8e7ef1914e05497649f@gmail.com
    Reported-by: Alexey Khoroshilov
    Signed-off-by: Vitaly Wool
    Cc: Dan Streetman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

25 Feb, 2017

5 commits

  • With both coming and already present locking optimizations, introducing
    kref to reference-count z3fold objects is the right thing to do.
    Moreover, it makes buddied list no longer necessary, and allows for a
    simpler handling of headless pages.

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/20170131214650.8ea78033d91ded233f552bc0@gmail.com
    Signed-off-by: Vitaly Wool
    Reviewed-by: Dan Streetman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • Most of z3fold operations are in-page, such as modifying z3fold page
    header or moving z3fold objects within a page. Taking per-pool spinlock
    to protect per-page objects is therefore suboptimal, and the idea of
    having a per-page spinlock (or rwlock) has been around for some time.

    This patch implements spinlock-based per-page locking mechanism which is
    lightweight enough to normally fit ok into the z3fold header.

    Link: http://lkml.kernel.org/r/20170131214438.433e0a5fda908337b63206d3@gmail.com
    Signed-off-by: Vitaly Wool
    Reviewed-by: Dan Streetman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • z3fold_compact_page() currently only handles the situation when there's
    a single middle chunk within the z3fold page. However it may be worth
    it to move middle chunk closer to either first or last chunk, whichever
    is there, if the gap between them is big enough.

    This patch adds the relevant code, using BIG_CHUNK_GAP define as a
    threshold for middle chunk to be worth moving.

    Link: http://lkml.kernel.org/r/20170131214334.c4f3eac9a477af0fa9a22c46@gmail.com
    Signed-off-by: Vitaly Wool
    Reviewed-by: Dan Streetman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • Currently the whole kernel build will be stopped if the size of struct
    z3fold_header is greater than the size of one chunk, which is 64 bytes
    by default. This patch instead defines the offset for z3fold objects as
    the size of the z3fold header in chunks.

    Fixed also are the calculation of num_free_chunks() and the address to
    move the middle chunk to in case of in-page compaction in
    z3fold_compact_page().

    Link: http://lkml.kernel.org/r/20170131214057.d98677032bc7b1c6c59a80c9@gmail.com
    Signed-off-by: Vitaly Wool
    Reviewed-by: Dan Streetman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • Convert pages_nr per-pool counter to atomic64_t.

    Link: http://lkml.kernel.org/r/20170131213946.b828676ab17bbea42022c213@gmail.com
    Signed-off-by: Vitaly Wool
    Reviewed-by: Dan Streetman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool