29 Jul, 2016

8 commits

  • iput() tests whether its argument is NULL and then returns immediately.
    Thus the test around the call is not needed.

    This issue was detected by using the Coccinelle software.

    Link: http://lkml.kernel.org/r/559cf499-4a01-25f9-c87f-24d906626a57@users.sourceforge.net
    Signed-off-by: Markus Elfring
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Markus Elfring
     
  • Use ClearPagePrivate/ClearPagePrivate2 helpers to clear
    PG_private/PG_private_2 in page->flags

    Link: http://lkml.kernel.org/r/1467882338-4300-7-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Acked-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • Add __init,__exit attribute for function that only called in module
    init/exit to save memory.

    Link: http://lkml.kernel.org/r/1467882338-4300-6-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Cc: Sergey Senozhatsky
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • Some minor commebnt changes:

    1). update zs_malloc(),zs_create_pool() function header
    2). update "Usage of struct page fields"

    Link: http://lkml.kernel.org/r/1467882338-4300-5-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • Currently, if a class can not be merged, the max objects of zspage in
    that class may be calculated twice.

    This patch calculate max objects of zspage at the begin, and pass the
    value to can_merge() to decide whether the class can be merged.

    Also this patch remove function get_maxobj_per_zspage(), as there is no
    other place to call this function.

    Link: http://lkml.kernel.org/r/1467882338-4300-4-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • num of max objects in zspage is stored in each size_class now. So there
    is no need to re-calculate it.

    Link: http://lkml.kernel.org/r/1467882338-4300-3-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Acked-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • the obj index value should be updated after return from
    find_alloced_obj() to avoid CPU burning caused by unnecessary object
    scanning.

    Link: http://lkml.kernel.org/r/1467882338-4300-2-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • This is a cleanup patch. Change "index" to "obj_index" to keep
    consistent with others in zsmalloc.

    Link: http://lkml.kernel.org/r/1467882338-4300-1-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     

27 Jul, 2016

11 commits

  • Randy reported below build error.

    > In file included from ../include/linux/balloon_compaction.h:48:0,
    > from ../mm/balloon_compaction.c:11:
    > ../include/linux/compaction.h:237:51: warning: 'struct node' declared inside parameter list [enabled by default]
    > static inline int compaction_register_node(struct node *node)
    > ../include/linux/compaction.h:237:51: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
    > ../include/linux/compaction.h:242:54: warning: 'struct node' declared inside parameter list [enabled by default]
    > static inline void compaction_unregister_node(struct node *node)
    >

    It was caused by non-lru page migration which needs compaction.h but
    compaction.h doesn't include any header to be standalone.

    I think proper header for non-lru page migration is migrate.h rather
    than compaction.h because migrate.h has already headers needed to work
    non-lru page migration indirectly like isolate_mode_t, migrate_mode
    MIGRATEPAGE_SUCCESS.

    [akpm@linux-foundation.org: revert mm-balloon-use-general-non-lru-movable-page-feature-fix.patch temp fix]
    Link: http://lkml.kernel.org/r/20160610003304.GE29779@bbox
    Signed-off-by: Minchan Kim
    Reported-by: Randy Dunlap
    Cc: Konstantin Khlebnikov
    Cc: Vlastimil Babka
    Cc: Gioh Kim
    Cc: Rafael Aquini
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • zram is very popular for some of the embedded world (e.g., TV, mobile
    phones). On those system, zsmalloc's consumed memory size is never
    trivial (one of example from real product system, total memory: 800M,
    zsmalloc consumed: 150M), so we have used this out of tree patch to
    monitor system memory behavior via /proc/vmstat.

    With zsmalloc in vmstat, it helps in tracking down system behavior due
    to memory usage.

    [minchan@kernel.org: zsmalloc: follow up zsmalloc vmstat]
    Link: http://lkml.kernel.org/r/20160607091737.GC23435@bbox
    [akpm@linux-foundation.org: fix build with CONFIG_ZSMALLOC=m]
    Link: http://lkml.kernel.org/r/1464919731-13255-1-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Sangseok Lee
    Cc: Chanho Min
    Cc: Chan Gyun Jeong
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Static check warns using tag as bit shifter. It doesn't break current
    working but not good for redability. Let's use OBJ_TAG_BIT as bit
    shifter instead of OBJ_ALLOCATED_TAG.

    Link: http://lkml.kernel.org/r/20160607045146.GF26230@bbox
    Signed-off-by: Minchan Kim
    Reported-by: Dan Carpenter
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • This patch introduces run-time migration feature for zspage.

    For migration, VM uses page.lru field so it would be better to not use
    page.next field which is unified with page.lru for own purpose. For
    that, firstly, we can get first object offset of the page via runtime
    calculation instead of using page.index so we can use page.index as link
    for page chaining instead of page.next.

    In case of huge object, it stores handle to page.index instead of next
    link of page chaining because huge object doesn't need to next link for
    page chaining. So get_next_page need to identify huge object to return
    NULL. For it, this patch uses PG_owner_priv_1 flag of the page flag.

    For migration, it supports three functions

    * zs_page_isolate

    It isolates a zspage which includes a subpage VM want to migrate from
    class so anyone cannot allocate new object from the zspage.

    We could try to isolate a zspage by the number of subpage so subsequent
    isolation trial of other subpage of the zpsage shouldn't fail. For
    that, we introduce zspage.isolated count. With that, zs_page_isolate
    can know whether zspage is already isolated or not for migration so if
    it is isolated for migration, subsequent isolation trial can be
    successful without trying further isolation.

    * zs_page_migrate

    First of all, it holds write-side zspage->lock to prevent migrate other
    subpage in zspage. Then, lock all objects in the page VM want to
    migrate. The reason we should lock all objects in the page is due to
    race between zs_map_object and zs_page_migrate.

    zs_map_object zs_page_migrate

    pin_tag(handle)
    obj = handle_to_obj(handle)
    obj_to_location(obj, &page, &obj_idx);

    write_lock(&zspage->lock)
    if (!trypin_tag(handle))
    goto unpin_object

    zspage = get_zspage(page);
    read_lock(&zspage->lock);

    If zs_page_migrate doesn't do trypin_tag, zs_map_object's page can be
    stale by migration so it goes crash.

    If it locks all of objects successfully, it copies content from old page
    to new one, finally, create new zspage chain with new page. And if it's
    last isolated subpage in the zspage, put the zspage back to class.

    * zs_page_putback

    It returns isolated zspage to right fullness_group list if it fails to
    migrate a page. If it find a zspage is ZS_EMPTY, it queues zspage
    freeing to workqueue. See below about async zspage freeing.

    This patch introduces asynchronous zspage free. The reason to need it
    is we need page_lock to clear PG_movable but unfortunately, zs_free path
    should be atomic so the apporach is try to grab page_lock. If it got
    page_lock of all of pages successfully, it can free zspage immediately.
    Otherwise, it queues free request and free zspage via workqueue in
    process context.

    If zs_free finds the zspage is isolated when it try to free zspage, it
    delays the freeing until zs_page_putback finds it so it will free free
    the zspage finally.

    In this patch, we expand fullness_list from ZS_EMPTY to ZS_FULL. First
    of all, it will use ZS_EMPTY list for delay freeing. And with adding
    ZS_FULL list, it makes to identify whether zspage is isolated or not via
    list_empty(&zspage->list) test.

    [minchan@kernel.org: zsmalloc: keep first object offset in struct page]
    Link: http://lkml.kernel.org/r/1465788015-23195-1-git-send-email-minchan@kernel.org
    [minchan@kernel.org: zsmalloc: zspage sanity check]
    Link: http://lkml.kernel.org/r/20160603010129.GC3304@bbox
    Link: http://lkml.kernel.org/r/1464736881-24886-12-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Zsmalloc stores first free object's position into freeobj
    in each zspage. If we change it with index from first_page instead of
    position, it makes page migration simple because we don't need to
    correct other entries for linked list if a page is migrated out.

    Link: http://lkml.kernel.org/r/1464736881-24886-11-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Currently, putback_zspage does free zspage under class->lock if fullness
    become ZS_EMPTY but it makes trouble to implement locking scheme for new
    zspage migration. So, this patch is to separate free_zspage from
    putback_zspage and free zspage out of class->lock which is preparation
    for zspage migration.

    Link: http://lkml.kernel.org/r/1464736881-24886-10-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • We have squeezed meta data of zspage into first page's descriptor. So,
    to get meta data from subpage, we should get first page first of all.
    But it makes trouble to implment page migration feature of zsmalloc
    because any place where to get first page from subpage can be raced with
    first page migration. IOW, first page it got could be stale. For
    preventing it, I have tried several approahces but it made code
    complicated so finally, I concluded to separate metadata from first
    page. Of course, it consumes more memory. IOW, 16bytes per zspage on
    32bit at the moment. It means we lost 1% at *worst case*(40B/4096B)
    which is not bad I think at the cost of maintenance.

    Link: http://lkml.kernel.org/r/1464736881-24886-9-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • For page migration, we need to create page chain of zspage dynamically
    so this patch factors it out from alloc_zspage.

    Link: http://lkml.kernel.org/r/1464736881-24886-8-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Upcoming patch will change how to encode zspage meta so for easy review,
    this patch wraps code to access metadata as accessor.

    Link: http://lkml.kernel.org/r/1464736881-24886-7-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Use kernel standard bit spin-lock instead of custom mess. Even, it has
    a bug which doesn't disable preemption. The reason we don't have any
    problem is that we have used it during preemption disable section by
    class->lock spinlock. So no need to go to stable.

    Link: http://lkml.kernel.org/r/1464736881-24886-6-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Every zspage in a size_class has same number of max objects so we could
    move it to a size_class.

    Link: http://lkml.kernel.org/r/1464736881-24886-5-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

27 May, 2016

1 commit

  • Some updates to commit d34f615720d1 ("mm/zsmalloc: don't fail if can't
    create debugfs info"):

    - add pr_warn to all stat failure cases
    - do not prevent module loading on stat failure

    Link: http://lkml.kernel.org/r/1463671123-5479-1-git-send-email-ddstreet@ieee.org
    Signed-off-by: Dan Streetman
    Reviewed-by: Ganesh Mahendran
    Acked-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: Dan Streetman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     

21 May, 2016

6 commits

  • Change the return type of zs_pool_stat_create() to void, and remove the
    logic to abort pool creation if the stat debugfs dir/file could not be
    created.

    The debugfs stat file is for debugging/information only, and doesn't
    affect operation of zsmalloc; there is no reason to abort creating the
    pool if the stat file can't be created. This was seen with zswap, which
    used the same name for all pool creations, which caused zsmalloc to fail
    to create a second pool for zswap if CONFIG_ZSMALLOC_STAT was enabled.

    Signed-off-by: Dan Streetman
    Reviewed-by: Sergey Senozhatsky
    Cc: Dan Streetman
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     
  • Pass GFP flags to zs_malloc() instead of using a fixed mask supplied to
    zs_create_pool(), so we can be more flexible, but, more importantly, we
    need this to switch zram to per-cpu compression streams -- zram will try
    to allocate handle with preemption disabled in a fast path and switch to
    a slow path (using different gfp mask) if the fast one has failed.

    Apart from that, this also align zs_malloc() interface with zspool/zbud.

    [sergey.senozhatsky@gmail.com: pass GFP flags to zs_malloc() instead of using a fixed mask]
    Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
    Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • Let's remove unused pool param in obj_free

    Signed-off-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Clean up function parameter ordering to order higher data structure
    first.

    Signed-off-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • There are many BUG_ON in zsmalloc.c which is not recommened so change
    them as alternatives.

    Normal rule is as follows:

    1. avoid BUG_ON if possible. Instead, use VM_BUG_ON or VM_BUG_ON_PAGE

    2. use VM_BUG_ON_PAGE if we need to see struct page's fields

    3. use those assertion in primitive functions so higher functions can
    rely on the assertion in the primitive function.

    4. Don't use assertion if following instruction can trigger Oops

    Signed-off-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Clean up function parameter "struct page". Many functions of zsmalloc
    expect that page paramter is "first_page" so use "first_page" rather
    than "page" for code readability.

    Signed-off-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

10 May, 2016

1 commit

  • zs_can_compact() has two race conditions in its core calculation:

    unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
    zs_stat_get(class, OBJ_USED);

    1) classes are not locked, so the numbers of allocated and used
    objects can change by the concurrent ops happening on other CPUs
    2) shrinker invokes it from preemptible context

    Depending on the circumstances, thus, OBJ_ALLOCATED can become
    less than OBJ_USED, which can result in either very high or
    negative `total_scan' value calculated later in do_shrink_slab().

    do_shrink_slab() has some logic to prevent those cases:

    vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
    vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
    vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
    vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
    vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
    vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62

    However, due to the way `total_scan' is calculated, not every
    shrinker->count_objects() overflow can be spotted and handled.
    To demonstrate the latter, I added some debugging code to do_shrink_slab()
    (x86_64) and the results were:

    vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
    vmscan: but total_scan > 0: 92679974445502
    vmscan: resulting total_scan: 92679974445502
    [..]
    vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
    vmscan: but total_scan > 0: 22634041808232578
    vmscan: resulting total_scan: 22634041808232578

    Even though shrinker->count_objects() has returned an overflowed value,
    the resulting `total_scan' is positive, and, what is more worrisome, it
    is insanely huge. This value is getting used later on in
    shrinker->scan_objects() loop:

    while (total_scan >= batch_size ||
    total_scan >= freeable) {
    unsigned long ret;
    unsigned long nr_to_scan = min(batch_size, total_scan);

    shrinkctl->nr_to_scan = nr_to_scan;
    ret = shrinker->scan_objects(shrinker, shrinkctl);
    if (ret == SHRINK_STOP)
    break;
    freed += ret;

    count_vm_events(SLABS_SCANNED, nr_to_scan);
    total_scan -= nr_to_scan;

    cond_resched();
    }

    `total_scan >= batch_size' is true for a very-very long time and
    'total_scan >= freeable' is also true for quite some time, because
    `freeable < 0' and `total_scan' is large enough, for example,
    22634041808232578. The only break condition, in the given scheme of
    things, is shrinker->scan_objects() == SHRINK_STOP test, which is a
    bit too weak to rely on, especially in heavy zsmalloc-usage scenarios.

    To fix the issue, take a pool stat snapshot and use it instead of
    racy zs_stat_get() calls.

    Link: http://lkml.kernel.org/r/20160509140052.3389-1-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Cc: Minchan Kim
    Cc: [4.3+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     

18 Mar, 2016

2 commits

  • Add a new column to pool stats, which will tell how many pages ideally
    can be freed by class compaction, so it will be easier to analyze
    zsmalloc fragmentation.

    At the moment, we have only numbers of FULL and ALMOST_EMPTY classes,
    but they don't tell us how badly the class is fragmented internally.

    The new /sys/kernel/debug/zsmalloc/zramX/classes output look as follows:

    class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
    [..]
    12 224 0 2 146 5 8 4 4
    13 240 0 0 0 0 0 1 0
    14 256 1 13 1840 1672 115 1 10
    15 272 0 0 0 0 0 1 0
    [..]
    49 816 0 3 745 735 149 1 2
    51 848 3 4 361 306 76 4 8
    52 864 12 14 378 268 81 3 21
    54 896 1 12 117 57 26 2 12
    57 944 0 0 0 0 0 3 0
    [..]
    Total 26 131 12709 10994 1071 134

    For example, from this particular output we can easily conclude that
    class-896 is heavily fragmented -- it occupies 26 pages, 12 can be freed
    by compaction.

    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • When unmapping a huge class page in zs_unmap_object, the page will be
    unmapped by kmap_atomic. the "!area->huge" branch in __zs_unmap_object
    is alway true, and no code set "area->huge" now, so we can drop it.

    Signed-off-by: YiPing Xu
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    YiPing Xu
     

21 Jan, 2016

1 commit

  • record_obj() in migrate_zspage() does not preserve handle's
    HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
    (accidentally) un-pins the handle, while migrate_zspage() still performs
    an explicit unpin_tag() on the that handle. This additional explicit
    unpin_tag() introduces a race condition with zs_free(), which can pin
    that handle by this time, so the handle becomes un-pinned.

    Schematically, it goes like this:

    CPU0 CPU1
    migrate_zspage
    find_alloced_obj
    trypin_tag
    set HANDLE_PIN_BIT zs_free()
    pin_tag()
    obj_malloc() -- new object, no tag
    record_obj() -- remove HANDLE_PIN_BIT set HANDLE_PIN_BIT
    unpin_tag() -- remove zs_free's HANDLE_PIN_BIT

    The race condition may result in a NULL pointer dereference:

    Unable to handle kernel NULL pointer dereference at virtual address 00000000
    CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
    PC is at get_zspage_mapping+0x0/0x24
    LR is at obj_free.isra.22+0x64/0x128
    Call trace:
    get_zspage_mapping+0x0/0x24
    zs_free+0x88/0x114
    zram_free_page+0x64/0xcc
    zram_slot_free_notify+0x90/0x108
    swap_entry_free+0x278/0x294
    free_swap_and_cache+0x38/0x11c
    unmap_single_vma+0x480/0x5c8
    unmap_vmas+0x44/0x60
    exit_mmap+0x50/0x110
    mmput+0x58/0xe0
    do_exit+0x320/0x8dc
    do_group_exit+0x44/0xa8
    get_signal+0x538/0x580
    do_signal+0x98/0x4b8
    do_notify_resume+0x14/0x5c

    This patch keeps the lock bit in migration path and update value
    atomically.

    Signed-off-by: Junil Lee
    Signed-off-by: Minchan Kim
    Acked-by: Vlastimil Babka
    Cc: Sergey Senozhatsky
    Cc: [4.1+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junil Lee
     

16 Jan, 2016

1 commit


07 Nov, 2015

8 commits

  • We are going to rework how compound_head() work. It will not use
    page->first_page as we have it now.

    The only other user of page->first_page beyond compound pages is
    zsmalloc.

    Let's use page->private instead of page->first_page here. It occupies
    the same storage space.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Vlastimil Babka
    Reviewed-by: Sergey Senozhatsky
    Reviewed-by: Andrea Arcangeli
    Cc: "Paul E. McKenney"
    Cc: Andi Kleen
    Cc: Aneesh Kumar K.V
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Hugh Dickins
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Each `struct size_class' contains `struct zs_size_stat': an array of
    NR_ZS_STAT_TYPE `unsigned long'. For zsmalloc built with no
    CONFIG_ZSMALLOC_STAT this results in a waste of `2 * sizeof(unsigned
    long)' per-class.

    The patch removes unneeded `struct zs_size_stat' members by redefining
    NR_ZS_STAT_TYPE (max stat idx in array).

    Since both NR_ZS_STAT_TYPE and zs_stat_type are compile time constants,
    GCC can eliminate zs_stat_inc()/zs_stat_dec() calls that use zs_stat_type
    larger than NR_ZS_STAT_TYPE: CLASS_ALMOST_EMPTY and CLASS_ALMOST_FULL at
    the moment.

    ./scripts/bloat-o-meter mm/zsmalloc.o.old mm/zsmalloc.o.new
    add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-39 (-39)
    function old new delta
    fix_fullness_group 97 94 -3
    insert_zspage 100 86 -14
    remove_zspage 141 119 -22

    To summarize:
    a) each class now uses less memory
    b) we avoid a number of dec/inc stats (a minor optimization,
    but still).

    The gain will increase once we introduce additional stats.

    A simple IO test.

    iozone -t 4 -R -r 32K -s 60M -I +Z
    patched base
    " Initial write " 4145599.06 4127509.75
    " Rewrite " 4146225.94 4223618.50
    " Read " 17157606.00 17211329.50
    " Re-read " 17380428.00 17267650.50
    " Reverse Read " 16742768.00 16162732.75
    " Stride read " 16586245.75 16073934.25
    " Random read " 16349587.50 15799401.75
    " Mixed workload " 10344230.62 9775551.50
    " Random write " 4277700.62 4260019.69
    " Pwrite " 4302049.12 4313703.88
    " Pread " 6164463.16 6126536.72
    " Fwrite " 7131195.00 6952586.00
    " Fread " 12682602.25 12619207.50

    Signed-off-by: Sergey Senozhatsky
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • Signed-off-by: Hui Zhu
    Reviewed-by: Sergey Senozhatsky
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hui Zhu
     
  • We don't let user to disable shrinker in zsmalloc (once it's been
    enabled), so no need to check ->shrinker_enabled in zs_shrinker_count(),
    at the moment at least.

    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • A cosmetic change.

    Commit c60369f01125 ("staging: zsmalloc: prevent mappping in interrupt
    context") added in_interrupt() check to zs_map_object() and 'hardirq.h'
    include; but in_interrupt() macro is defined in 'preempt.h' not in
    'hardirq.h', so include it instead.

    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • In obj_malloc():

    if (!class->huge)
    /* record handle in the header of allocated chunk */
    link->handle = handle;
    else
    /* record handle in first_page->private */
    set_page_private(first_page, handle);

    In the hugepage we save handle to private directly.

    But in obj_to_head():

    if (class->huge) {
    VM_BUG_ON(!is_first_page(page));
    return *(unsigned long *)page_private(page);
    } else
    return *(unsigned long *)obj;

    It is used as a pointer.

    The reason why there is no problem until now is huge-class page is born
    with ZS_FULL so it can't be migrated. However, we need this patch for
    future work: "VM-aware zsmalloced page migration" to reduce external
    fragmentation.

    Signed-off-by: Hui Zhu
    Acked-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hui Zhu
     
  • [akpm@linux-foundation.org: fix grammar]
    Signed-off-by: Hui Zhu
    Reviewed-by: Sergey Senozhatsky
    Cc: Dan Streetman
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hui Zhu
     
  • Constify `struct zs_pool' ->name.

    [akpm@inux-foundation.org: constify zpool_create_pool()'s `type' arg also]
    Signed-off-by: Sergey Senozhatsky
    Acked-by: Dan Streetman
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey SENOZHATSKY
     

09 Sep, 2015

1 commit