18 Mar, 2016

2 commits

  • Add a new column to pool stats, which will tell how many pages ideally
    can be freed by class compaction, so it will be easier to analyze
    zsmalloc fragmentation.

    At the moment, we have only numbers of FULL and ALMOST_EMPTY classes,
    but they don't tell us how badly the class is fragmented internally.

    The new /sys/kernel/debug/zsmalloc/zramX/classes output look as follows:

    class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
    [..]
    12 224 0 2 146 5 8 4 4
    13 240 0 0 0 0 0 1 0
    14 256 1 13 1840 1672 115 1 10
    15 272 0 0 0 0 0 1 0
    [..]
    49 816 0 3 745 735 149 1 2
    51 848 3 4 361 306 76 4 8
    52 864 12 14 378 268 81 3 21
    54 896 1 12 117 57 26 2 12
    57 944 0 0 0 0 0 3 0
    [..]
    Total 26 131 12709 10994 1071 134

    For example, from this particular output we can easily conclude that
    class-896 is heavily fragmented -- it occupies 26 pages, 12 can be freed
    by compaction.

    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • When unmapping a huge class page in zs_unmap_object, the page will be
    unmapped by kmap_atomic. the "!area->huge" branch in __zs_unmap_object
    is alway true, and no code set "area->huge" now, so we can drop it.

    Signed-off-by: YiPing Xu
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    YiPing Xu
     

21 Jan, 2016

1 commit

  • record_obj() in migrate_zspage() does not preserve handle's
    HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
    (accidentally) un-pins the handle, while migrate_zspage() still performs
    an explicit unpin_tag() on the that handle. This additional explicit
    unpin_tag() introduces a race condition with zs_free(), which can pin
    that handle by this time, so the handle becomes un-pinned.

    Schematically, it goes like this:

    CPU0 CPU1
    migrate_zspage
    find_alloced_obj
    trypin_tag
    set HANDLE_PIN_BIT zs_free()
    pin_tag()
    obj_malloc() -- new object, no tag
    record_obj() -- remove HANDLE_PIN_BIT set HANDLE_PIN_BIT
    unpin_tag() -- remove zs_free's HANDLE_PIN_BIT

    The race condition may result in a NULL pointer dereference:

    Unable to handle kernel NULL pointer dereference at virtual address 00000000
    CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
    PC is at get_zspage_mapping+0x0/0x24
    LR is at obj_free.isra.22+0x64/0x128
    Call trace:
    get_zspage_mapping+0x0/0x24
    zs_free+0x88/0x114
    zram_free_page+0x64/0xcc
    zram_slot_free_notify+0x90/0x108
    swap_entry_free+0x278/0x294
    free_swap_and_cache+0x38/0x11c
    unmap_single_vma+0x480/0x5c8
    unmap_vmas+0x44/0x60
    exit_mmap+0x50/0x110
    mmput+0x58/0xe0
    do_exit+0x320/0x8dc
    do_group_exit+0x44/0xa8
    get_signal+0x538/0x580
    do_signal+0x98/0x4b8
    do_notify_resume+0x14/0x5c

    This patch keeps the lock bit in migration path and update value
    atomically.

    Signed-off-by: Junil Lee
    Signed-off-by: Minchan Kim
    Acked-by: Vlastimil Babka
    Cc: Sergey Senozhatsky
    Cc: [4.1+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junil Lee
     

16 Jan, 2016

1 commit


07 Nov, 2015

8 commits

  • We are going to rework how compound_head() work. It will not use
    page->first_page as we have it now.

    The only other user of page->first_page beyond compound pages is
    zsmalloc.

    Let's use page->private instead of page->first_page here. It occupies
    the same storage space.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Vlastimil Babka
    Reviewed-by: Sergey Senozhatsky
    Reviewed-by: Andrea Arcangeli
    Cc: "Paul E. McKenney"
    Cc: Andi Kleen
    Cc: Aneesh Kumar K.V
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Each `struct size_class' contains `struct zs_size_stat': an array of
    NR_ZS_STAT_TYPE `unsigned long'. For zsmalloc built with no
    CONFIG_ZSMALLOC_STAT this results in a waste of `2 * sizeof(unsigned
    long)' per-class.

    The patch removes unneeded `struct zs_size_stat' members by redefining
    NR_ZS_STAT_TYPE (max stat idx in array).

    Since both NR_ZS_STAT_TYPE and zs_stat_type are compile time constants,
    GCC can eliminate zs_stat_inc()/zs_stat_dec() calls that use zs_stat_type
    larger than NR_ZS_STAT_TYPE: CLASS_ALMOST_EMPTY and CLASS_ALMOST_FULL at
    the moment.

    ./scripts/bloat-o-meter mm/zsmalloc.o.old mm/zsmalloc.o.new
    add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-39 (-39)
    function old new delta
    fix_fullness_group 97 94 -3
    insert_zspage 100 86 -14
    remove_zspage 141 119 -22

    To summarize:
    a) each class now uses less memory
    b) we avoid a number of dec/inc stats (a minor optimization,
    but still).

    The gain will increase once we introduce additional stats.

    A simple IO test.

    iozone -t 4 -R -r 32K -s 60M -I +Z
    patched base
    " Initial write " 4145599.06 4127509.75
    " Rewrite " 4146225.94 4223618.50
    " Read " 17157606.00 17211329.50
    " Re-read " 17380428.00 17267650.50
    " Reverse Read " 16742768.00 16162732.75
    " Stride read " 16586245.75 16073934.25
    " Random read " 16349587.50 15799401.75
    " Mixed workload " 10344230.62 9775551.50
    " Random write " 4277700.62 4260019.69
    " Pwrite " 4302049.12 4313703.88
    " Pread " 6164463.16 6126536.72
    " Fwrite " 7131195.00 6952586.00
    " Fread " 12682602.25 12619207.50

    Signed-off-by: Sergey Senozhatsky
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • Signed-off-by: Hui Zhu
    Reviewed-by: Sergey Senozhatsky
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hui Zhu
     
  • We don't let user to disable shrinker in zsmalloc (once it's been
    enabled), so no need to check ->shrinker_enabled in zs_shrinker_count(),
    at the moment at least.

    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • A cosmetic change.

    Commit c60369f01125 ("staging: zsmalloc: prevent mappping in interrupt
    context") added in_interrupt() check to zs_map_object() and 'hardirq.h'
    include; but in_interrupt() macro is defined in 'preempt.h' not in
    'hardirq.h', so include it instead.

    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • In obj_malloc():

    if (!class->huge)
    /* record handle in the header of allocated chunk */
    link->handle = handle;
    else
    /* record handle in first_page->private */
    set_page_private(first_page, handle);

    In the hugepage we save handle to private directly.

    But in obj_to_head():

    if (class->huge) {
    VM_BUG_ON(!is_first_page(page));
    return *(unsigned long *)page_private(page);
    } else
    return *(unsigned long *)obj;

    It is used as a pointer.

    The reason why there is no problem until now is huge-class page is born
    with ZS_FULL so it can't be migrated. However, we need this patch for
    future work: "VM-aware zsmalloced page migration" to reduce external
    fragmentation.

    Signed-off-by: Hui Zhu
    Acked-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hui Zhu
     
  • [akpm@linux-foundation.org: fix grammar]
    Signed-off-by: Hui Zhu
    Reviewed-by: Sergey Senozhatsky
    Cc: Dan Streetman
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hui Zhu
     
  • Constify `struct zs_pool' ->name.

    [akpm@inux-foundation.org: constify zpool_create_pool()'s `type' arg also]
    Signed-off-by: Sergey Senozhatsky
    Acked-by: Dan Streetman
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey SENOZHATSKY
     

09 Sep, 2015

13 commits

  • The structure zpool_ops is not modified so make the pointer to it a
    pointer to const.

    Signed-off-by: Krzysztof Kozlowski
    Acked-by: Dan Streetman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Krzysztof Kozlowski
     
  • We can pass a NULL cache pointer to kmem_cache_destroy(), because it
    NULL-checks its argument now. Remove redundant test from
    destroy_handle_cache().

    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • We can avoid taking class ->lock around zs_can_compact() in
    zs_shrinker_count(), because the number that we return back is outdated
    in general case, by design. We have different sources that are able to
    change class's state right after we return from zs_can_compact() --
    ongoing I/O operations, manually triggered compaction, or two of them
    happening simultaneously.

    We re-do this calculations during compaction on a per class basis
    anyway.

    zs_unregister_shrinker() will not return until we have an active
    shrinker, so classes won't unexpectedly disappear while
    zs_shrinker_count() iterates them.

    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • There is no need to recalcurate pages_per_zspage in runtime. Just use
    class->pages_per_zspage to avoid unnecessary runtime overhead.

    Signed-off-by: Minchan Kim
    Acked-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • There is no reason to prevent select ZS_ALMOST_FULL as migration source
    if we cannot find source from ZS_ALMOST_EMPTY.

    With this patch, zs_can_compact will return more exact result.

    Signed-off-by: Minchan Kim
    Acked-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • We want to see more ZS_FULL pages and less ZS_ALMOST_{FULL, EMPTY}
    pages. Put a page with higher ->inuse count first within its
    ->fullness_list, which will give us better chances to fill up this page
    with new objects (find_get_zspage() return ->fullness_list head for new
    object allocation), so some zspages will become ZS_ALMOST_FULL/ZS_FULL
    quicker.

    It performs a trivial and cheap ->inuse compare which does not slow down
    zsmalloc and in the worst case keeps the list pages in no particular
    order.

    A more expensive solution could sort fullness_list by ->inuse count.

    [minchan@kernel.org: code adjustments]
    Signed-off-by: Sergey Senozhatsky
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • Perform automatic pool compaction by a shrinker when system is getting
    tight on memory.

    User-space has a very little knowledge regarding zsmalloc fragmentation
    and basically has no mechanism to tell whether compaction will result in
    any memory gain. Another issue is that user space is not always aware
    of the fact that system is getting tight on memory. Which leads to very
    uncomfortable scenarios when user space may start issuing compaction
    'randomly' or from crontab (for example). Fragmentation is not always
    necessarily bad, allocated and unused objects, after all, may be filled
    with the data later, w/o the need of allocating a new zspage. On the
    other hand, we obviously don't want to waste memory when the system
    needs it.

    Compaction now has a relatively quick pool scan so we are able to
    estimate the number of pages that will be freed easily, which makes it
    possible to call this function from a shrinker->count_objects()
    callback. We also abort compaction as soon as we detect that we can't
    free any pages any more, preventing wasteful objects migrations.

    Signed-off-by: Sergey Senozhatsky
    Suggested-by: Minchan Kim
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • Compaction returns back to zram the number of migrated objects, which is
    quite uninformative -- we have objects of different sizes so user space
    cannot obtain any valuable data from that number. Change compaction to
    operate in terms of pages and return back to compaction issuer the
    number of pages that were freed during compaction. So from now on we
    will export more meaningful value in zram/mm_stat -- the number of
    freed (compacted) pages.

    This requires:
    (a) a rename of `num_migrated' to 'pages_compacted'
    (b) a internal API change -- return first_page's fullness_group from
    putback_zspage(), so we know when putback_zspage() did
    free_zspage(). It helps us to account compaction stats correctly.

    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • `zs_compact_control' accounts the number of migrated objects but it has
    a limited lifespan -- we lose it as soon as zs_compaction() returns back
    to zram. It worked fine, because (a) zram had it's own counter of
    migrated objects and (b) only zram could trigger compaction. However,
    this does not work for automatic pool compaction (not issued by zram).
    To account objects migrated during auto-compaction (issued by the
    shrinker) we need to store this number in zs_pool.

    Define a new `struct zs_pool_stats' structure to keep zs_pool's stats
    there. It provides only `num_migrated', as of this writing, but it
    surely can be extended.

    A new zsmalloc zs_pool_stats() symbol exports zs_pool's stats back to
    caller.

    Use zs_pool_stats() in zram and remove `num_migrated' from zram_stats.

    Signed-off-by: Sergey Senozhatsky
    Suggested-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • Change zs_object_copy() argument order to be (DST, SRC) rather than
    (SRC, DST). copy/move functions usually have (to, from) arguments
    order.

    Rename alloc_target_page() to isolate_target_page(). This function
    doesn't allocate anything, it isolates target page, pretty much like
    isolate_source_page().

    Tweak __zs_compact() comment.

    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • This function checks if class compaction will free any pages.
    Rephrasing -- do we have enough unused objects to form at least one
    ZS_EMPTY page and free it. It aborts compaction if class compaction
    will not result in any (further) savings.

    EXAMPLE (this debug output is not part of this patch set):

    - class size
    - number of allocated objects
    - number of used objects
    - max objects per zspage
    - pages per zspage
    - estimated number of pages that will be freed

    [..]
    class-512 objs:544 inuse:540 maxobj-per-zspage:8 pages-per-zspage:1 zspages-to-free:0
    ... class-512 compaction is useless. break
    class-496 objs:660 inuse:570 maxobj-per-zspage:33 pages-per-zspage:4 zspages-to-free:2
    class-496 objs:627 inuse:570 maxobj-per-zspage:33 pages-per-zspage:4 zspages-to-free:1
    class-496 objs:594 inuse:570 maxobj-per-zspage:33 pages-per-zspage:4 zspages-to-free:0
    ... class-496 compaction is useless. break
    class-448 objs:657 inuse:617 maxobj-per-zspage:9 pages-per-zspage:1 zspages-to-free:4
    class-448 objs:648 inuse:617 maxobj-per-zspage:9 pages-per-zspage:1 zspages-to-free:3
    class-448 objs:639 inuse:617 maxobj-per-zspage:9 pages-per-zspage:1 zspages-to-free:2
    class-448 objs:630 inuse:617 maxobj-per-zspage:9 pages-per-zspage:1 zspages-to-free:1
    class-448 objs:621 inuse:617 maxobj-per-zspage:9 pages-per-zspage:1 zspages-to-free:0
    ... class-448 compaction is useless. break
    class-432 objs:728 inuse:685 maxobj-per-zspage:28 pages-per-zspage:3 zspages-to-free:1
    class-432 objs:700 inuse:685 maxobj-per-zspage:28 pages-per-zspage:3 zspages-to-free:0
    ... class-432 compaction is useless. break
    class-416 objs:819 inuse:705 maxobj-per-zspage:39 pages-per-zspage:4 zspages-to-free:2
    class-416 objs:780 inuse:705 maxobj-per-zspage:39 pages-per-zspage:4 zspages-to-free:1
    class-416 objs:741 inuse:705 maxobj-per-zspage:39 pages-per-zspage:4 zspages-to-free:0
    ... class-416 compaction is useless. break
    class-400 objs:690 inuse:674 maxobj-per-zspage:10 pages-per-zspage:1 zspages-to-free:1
    class-400 objs:680 inuse:674 maxobj-per-zspage:10 pages-per-zspage:1 zspages-to-free:0
    ... class-400 compaction is useless. break
    class-384 objs:736 inuse:709 maxobj-per-zspage:32 pages-per-zspage:3 zspages-to-free:0
    ... class-384 compaction is useless. break
    [..]

    Every "compaction is useless" indicates that we saved CPU cycles.

    class-512 has
    544 object allocated
    540 objects used
    8 objects per-page

    Even if we have a ALMOST_EMPTY zspage, we still don't have enough room to
    migrate all of its objects and free this zspage; so compaction will not
    make a lot of sense, it's better to just leave it as is.

    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • Always account per-class `zs_size_stat' stats. This data will help us
    make better decisions during compaction. We are especially interested
    in OBJ_ALLOCATED and OBJ_USED, which can tell us if class compaction
    will result in any memory gain.

    For instance, we know the number of allocated objects in the class, the
    number of objects being used (so we also know how many objects are not
    used) and the number of objects per-page. So we can ensure if we have
    enough unused objects to form at least one ZS_EMPTY zspage during
    compaction.

    We calculate this value on per-class basis so we can calculate a total
    number of zspages that can be released. Which is exactly what a
    shrinker wants to know.

    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • This patchset tweaks compaction and makes it possible to trigger pool
    compaction automatically when system is getting low on memory.

    zsmalloc in some cases can suffer from a notable fragmentation and
    compaction can release some considerable amount of memory. The problem
    here is that currently we fully rely on user space to perform compaction
    when needed. However, performing zsmalloc compaction is not always an
    obvious thing to do. For example, suppose we have a `idle' fragmented
    (compaction was never performed) zram device and system is getting low
    on memory due to some 3rd party user processes (gcc LTO, or firefox,
    etc.). It's quite unlikely that user space will issue zpool compaction
    in this case. Besides, user space cannot tell for sure how badly pool
    is fragmented; however, this info is known to zsmalloc and, hence, to a
    shrinker.

    This patch (of 7):

    __zs_compact() does not use `nr_to_migrate', drop it.

    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     

26 Jun, 2015

2 commits

  • Remove zpool_evict() helper function. As zbud is currently the only
    zpool implementation that supports eviction, add zpool and zpool_ops
    references to struct zbud_pool and directly call zpool_ops->evict(zpool,
    handle) on eviction.

    Currently zpool provides the zpool_evict helper which locks the zpool
    list lock and searches through all pools to find the specific one
    matching the caller, and call the corresponding zpool_ops->evict
    function. However, this is unnecessary, as the zbud pool can simply
    keep a reference to the zpool that created it, as well as the zpool_ops,
    and directly call the zpool_ops->evict function, when it needs to evict
    a page. This avoids a spinlock and list search in zpool for each
    eviction.

    Signed-off-by: Dan Streetman
    Cc: Seth Jennings
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     
  • The DEBUG define in zsmalloc is useless, there is no usage of it at all.

    Signed-off-by: Marcin Jabrzyk
    Acked-by: Sergey Senozhatsky
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marcin Jabrzyk
     

11 Jun, 2015

1 commit


16 Apr, 2015

12 commits

  • Do not perform cond_resched() before the busy compaction loop in
    __zs_compact(), because this loop does it when needed.

    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: Nitin Gupta
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • There is no point in overriding the size class below. It causes fatal
    corruption on the next chunk on the 3264-bytes size class, which is the
    last size class that is not huge.

    For example, if the requested size was exactly 3264 bytes, current
    zsmalloc allocates and returns a chunk from the size class of 3264 bytes,
    not 4096. User access to this chunk may overwrite head of the next
    adjacent chunk.

    Here is the panic log captured when freelist was corrupted due to this:

    Kernel BUG at ffffffc00030659c [verbose debug info unavailable]
    Internal error: Oops - BUG: 96000006 [#1] PREEMPT SMP
    Modules linked in:
    exynos-snapshot: core register saved(CPU:5)
    CPUMERRSR: 0000000000000000, L2MERRSR: 0000000000000000
    exynos-snapshot: context saved(CPU:5)
    exynos-snapshot: item - log_kevents is disabled
    CPU: 5 PID: 898 Comm: kswapd0 Not tainted 3.10.61-4497415-eng #1
    task: ffffffc0b8783d80 ti: ffffffc0b71e8000 task.ti: ffffffc0b71e8000
    PC is at obj_idx_to_offset+0x0/0x1c
    LR is at obj_malloc+0x44/0xe8
    pc : [] lr : [] pstate: a0000045
    sp : ffffffc0b71eb790
    x29: ffffffc0b71eb790 x28: ffffffc00204c000
    x27: 000000000001d96f x26: 0000000000000000
    x25: ffffffc098cc3500 x24: ffffffc0a13f2810
    x23: ffffffc098cc3501 x22: ffffffc0a13f2800
    x21: 000011e1a02006e3 x20: ffffffc0a13f2800
    x19: ffffffbc02a7e000 x18: 0000000000000000
    x17: 0000000000000000 x16: 0000000000000feb
    x15: 0000000000000000 x14: 00000000a01003e3
    x13: 0000000000000020 x12: fffffffffffffff0
    x11: ffffffc08b264000 x10: 00000000e3a01004
    x9 : ffffffc08b263fea x8 : ffffffc0b1e611c0
    x7 : ffffffc000307d24 x6 : 0000000000000000
    x5 : 0000000000000038 x4 : 000000000000011e
    x3 : ffffffbc00003e90 x2 : 0000000000000cc0
    x1 : 00000000d0100371 x0 : ffffffbc00003e90

    Reported-by: Sooyong Suk
    Signed-off-by: Heesub Shin
    Tested-by: Sooyong Suk
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heesub Shin
     
  • In putback_zspage, we don't need to insert a zspage into list of zspage
    in size_class again to just fix fullness group. We could do directly
    without reinsertion so we could save some instuctions.

    Reported-by: Heesub Shin
    Signed-off-by: Minchan Kim
    Cc: Nitin Gupta
    Cc: Sergey Senozhatsky
    Cc: Dan Streetman
    Cc: Seth Jennings
    Cc: Ganesh Mahendran
    Cc: Luigi Semenzato
    Cc: Gunho Lee
    Cc: Juneho Choi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • A micro-optimization. Avoid additional branching and reduce (a bit)
    registry pressure (f.e. s_off += size; d_off += size; may be calculated
    twise: first for >= PAGE_SIZE check and later for offset update in "else"
    clause).

    scripts/bloat-o-meter shows some improvement

    add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-10 (-10)
    function old new delta
    zs_object_copy 550 540 -10

    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: Nitin Gupta
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • Do not synchronize rcu in zs_compact(). Neither zsmalloc not
    zram use rcu.

    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: Nitin Gupta
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • Signed-off-by: Yinghao Xie
    Suggested-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghao Xie
     
  • Create zsmalloc doc which explains design concept and stat information.

    Signed-off-by: Minchan Kim
    Cc: Juneho Choi
    Cc: Gunho Lee
    Cc: Luigi Semenzato
    Cc: Dan Streetman
    Cc: Seth Jennings
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Sergey Senozhatsky
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • During investigating compaction, fullness information of each class is
    helpful for investigating how the compaction works well. With that, we
    could know how compaction works well more clear on each size class.

    Signed-off-by: Minchan Kim
    Cc: Juneho Choi
    Cc: Gunho Lee
    Cc: Luigi Semenzato
    Cc: Dan Streetman
    Cc: Seth Jennings
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Sergey Senozhatsky
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • We store handle on header of each allocated object so it increases the
    size of each object by sizeof(unsigned long).

    If zram stores 4096 bytes to zsmalloc(ie, bad compression), zsmalloc needs
    4104B-class to add handle.

    However, 4104B-class has 1-pages_per_zspage so wasted size by internal
    fragment is 8192 - 4104, which is terrible.

    So this patch records the handle in page->private on such huge object(ie,
    pages_per_zspage == 1 && maxobj_per_zspage == 1) instead of header of each
    object so we could use 4096B-class, not 4104B-class.

    Signed-off-by: Minchan Kim
    Cc: Juneho Choi
    Cc: Gunho Lee
    Cc: Luigi Semenzato
    Cc: Dan Streetman
    Cc: Seth Jennings
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Sergey Senozhatsky
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Curretly, zsmalloc regards a zspage as ZS_ALMOST_EMPTY if the zspage has
    under 1/4 used objects(ie, fullness_threshold_frac). It could make result
    in loose packing since zsmalloc migrates only ZS_ALMOST_EMPTY zspage out.

    This patch changes the rule so that zsmalloc makes zspage which has above
    3/4 used object ZS_ALMOST_FULL so it could make tight packing.

    Signed-off-by: Minchan Kim
    Cc: Juneho Choi
    Cc: Gunho Lee
    Cc: Luigi Semenzato
    Cc: Dan Streetman
    Cc: Seth Jennings
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Sergey Senozhatsky
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • This patch provides core functions for migration of zsmalloc. Migraion
    policy is simple as follows.

    for each size class {
    while {
    src_page = get zs_page from ZS_ALMOST_EMPTY
    if (!src_page)
    break;
    dst_page = get zs_page from ZS_ALMOST_FULL
    if (!dst_page)
    dst_page = get zs_page from ZS_ALMOST_EMPTY
    if (!dst_page)
    break;
    migrate(from src_page, to dst_page);
    }
    }

    For migration, we need to identify which objects in zspage are allocated
    to migrate them out. We could know it by iterating of freed objects in a
    zspage because first_page of zspage keeps free objects singly-linked list
    but it's not efficient. Instead, this patch adds a tag(ie,
    OBJ_ALLOCATED_TAG) in header of each object(ie, handle) so we could check
    whether the object is allocated easily.

    This patch adds another status bit in handle to synchronize between user
    access through zs_map_object and migration. During migration, we cannot
    move objects user are using due to data coherency between old object and
    new object.

    [akpm@linux-foundation.org: zsmalloc.c needs sched.h for cond_resched()]
    Signed-off-by: Minchan Kim
    Cc: Juneho Choi
    Cc: Gunho Lee
    Cc: Luigi Semenzato
    Cc: Dan Streetman
    Cc: Seth Jennings
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Sergey Senozhatsky
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • In later patch, migration needs some part of functions in zs_malloc and
    zs_free so this patch factor out them.

    Signed-off-by: Minchan Kim
    Cc: Juneho Choi
    Cc: Gunho Lee
    Cc: Luigi Semenzato
    Cc: Dan Streetman
    Cc: Seth Jennings
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Sergey Senozhatsky
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim