18 Aug, 2018

1 commit

  • The functions zs_page_isolate, zs_page_migrate, zs_page_putback,
    lock_zspage, trylock_zspage and structure zsmalloc_aops are local to
    source and do not need to be in global scope, so make them static.

    Cleans up sparse warnings:
    symbol 'zs_page_isolate' was not declared. Should it be static?
    symbol 'zs_page_migrate' was not declared. Should it be static?
    symbol 'zs_page_putback' was not declared. Should it be static?
    symbol 'zsmalloc_aops' was not declared. Should it be static?
    symbol 'lock_zspage' was not declared. Should it be static?
    symbol 'trylock_zspage' was not declared. Should it be static?

    [arnd@arndb.de: hide unused lock_zspage]
    Link: http://lkml.kernel.org/r/20180706130924.3891230-1-arnd@arndb.de
    Link: http://lkml.kernel.org/r/20180624213322.13776-1-colin.king@canonical.com
    Signed-off-by: Colin Ian King
    Reviewed-by: Sergey Senozhatsky
    Reviewed-by: Andrew Morton
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Colin Ian King
     

15 Jun, 2018

1 commit

  • mm/*.c files use symbolic and octal styles for permissions.

    Using octal and not symbolic permissions is preferred by many as more
    readable.

    https://lkml.org/lkml/2016/8/2/1945

    Prefer the direct use of octal for permissions.

    Done using
    $ scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace mm/*.c
    and some typing.

    Before: $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
    44
    After: $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
    86

    Miscellanea:

    o Whitespace neatening around these conversions.

    Link: http://lkml.kernel.org/r/2e032ef111eebcd4c5952bae86763b541d373469.1522102887.git.joe@perches.com
    Signed-off-by: Joe Perches
    Acked-by: David Rientjes
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

06 Apr, 2018

3 commits

  • Link: http://lkml.kernel.org/r/1519585191-10180-4-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Patch series "zsmalloc/zram: drop zram's max_zpage_size", v3.

    ZRAM's max_zpage_size is a bad thing. It forces zsmalloc to store
    normal objects as huge ones, which results in bigger zsmalloc memory
    usage. Drop it and use actual zsmalloc huge-class value when decide if
    the object is huge or not.

    This patch (of 2):

    Not every object can be share its zspage with other objects, e.g. when
    the object is as big as zspage or nearly as big a zspage. For such
    objects zsmalloc has a so called huge class - every object which belongs
    to huge class consumes the entire zspage (which consists of a physical
    page). On x86_64, PAGE_SHIFT 12 box, the first non-huge class size is
    3264, so starting down from size 3264, objects can share page(-s) and
    thus minimize memory wastage.

    ZRAM, however, has its own statically defined watermark for huge
    objects, namely "3 * PAGE_SIZE / 4 = 3072", and forcibly stores every
    object larger than this watermark (3072) as a PAGE_SIZE object, in other
    words, to a huge class, while zsmalloc can keep some of those objects in
    non-huge classes. This results in increased memory consumption.

    zsmalloc knows better if the object is huge or not. Introduce
    zs_huge_class_size() function which tells if the given object can be
    stored in one of non-huge classes or not. This will let us to drop
    ZRAM's huge object watermark and fully rely on zsmalloc when we decide
    if the object is huge.

    [sergey.senozhatsky.work@gmail.com: add pool param to zs_huge_class_size()]
    Link: http://lkml.kernel.org/r/20180314081833.1096-2-sergey.senozhatsky@gmail.com
    Link: http://lkml.kernel.org/r/20180306070639.7389-2-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • ...instead of open coding file operations followed by custom ->open()
    callbacks per each attribute.

    [andriy.shevchenko@linux.intel.com: add tags, fix compilation issue]
    Link: http://lkml.kernel.org/r/20180217144253.58604-1-andriy.shevchenko@linux.intel.com
    Link: http://lkml.kernel.org/r/20180214154644.54505-1-andriy.shevchenko@linux.intel.com
    Signed-off-by: Andy Shevchenko
    Reviewed-by: Matthew Wilcox
    Reviewed-by: Andrew Morton
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Christoph Lameter
    Cc: Tejun Heo
    Cc: Dennis Zhou
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     

14 Feb, 2018

1 commit

  • With boot-time switching between paging mode we will have variable
    MAX_PHYSMEM_BITS.

    Let's use the maximum variable possible for CONFIG_X86_5LEVEL=y
    configuration to define zsmalloc data structures.

    The patch introduces MAX_POSSIBLE_PHYSMEM_BITS to cover such case.
    It also suits well to handle PAE special case.

    Signed-off-by: Kirill A. Shutemov
    Reviewed-by: Nitin Gupta
    Acked-by: Minchan Kim
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Sergey Senozhatsky
    Cc: Thomas Gleixner
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20180214111656.88514-3-kirill.shutemov@linux.intel.com
    Signed-off-by: Ingo Molnar

    Kirill A. Shutemov
     

01 Feb, 2018

3 commits

  • Fix warning about shifting unsigned literals being undefined behavior.

    Link: http://lkml.kernel.org/r/1515642078-4259-1-git-send-email-nick.desaulniers@gmail.com
    Signed-off-by: Nick Desaulniers
    Suggested-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Cc: Andy Shevchenko
    Cc: Matthew Wilcox
    Cc: Nick Desaulniers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Desaulniers
     
  • We waste sizeof(swp_entry_t) for zswap header when using zsmalloc as
    zpool driver because zsmalloc doesn't support eviction.

    Add zpool_evictable() to detect if zpool is potentially evictable, and
    use it in zswap to avoid waste memory for zswap header.

    [yuzhao@google.com: The zpool->" prefix is a result of copy & paste]
    Link: http://lkml.kernel.org/r/20180110225626.110330-1-yuzhao@google.com
    Link: http://lkml.kernel.org/r/20180110224741.83751-1-yuzhao@google.com
    Signed-off-by: Yu Zhao
    Acked-by: Dan Streetman
    Reviewed-by: Sergey Senozhatsky
    Cc: Seth Jennings
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yu Zhao
     
  • Structure zs_pool has special flag to indicate success of shrinker
    initialization. unregister_shrinker() has improved and can detect by
    itself whether actual deinitialization should be performed or not, so
    extra flag becomes redundant.

    [akpm@linux-foundation.org: update comment (Aliaksei), remove unneeded cast]
    Link: http://lkml.kernel.org/r/1513680552-9798-1-git-send-email-akaraliou.dev@gmail.com
    Signed-off-by: Aliaksei Karaliou
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aliaksei Karaliou
     

05 Jan, 2018

1 commit

  • `struct file_system_type' and alloc_anon_inode() function are defined in
    fs.h, include it directly.

    Link: http://lkml.kernel.org/r/20171219104219.3017-1-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     

16 Nov, 2017

1 commit

  • Use BUG_ON(in_interrupt()) in zs_map_object(). This is not a new
    BUG_ON(), it's always been there, but was recently changed to
    VM_BUG_ON(). There are several problems there. First, we use use
    per-CPU mappings both in zsmalloc and in zram, and interrupt may easily
    corrupt those buffers. Second, and more importantly, we believe it's
    possible to start leaking sensitive information. Consider the following
    case:

    -> process P
    swap out
    zram
    per-cpu mapping CPU1
    compress page A
    -> IRQ

    swap out
    zram
    per-cpu mapping CPU1
    compress page B
    write page from per-cpu mapping CPU1 to zsmalloc pool
    iret

    -> process P
    write page from per-cpu mapping CPU1 to zsmalloc pool [*]
    return

    * so we store overwritten data that actually belongs to another
    page (task) and potentially contains sensitive data. And when
    process P will page fault it's going to read (swap in) that
    other task's data.

    Link: http://lkml.kernel.org/r/20170929045140.4055-1-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     

09 Sep, 2017

2 commits

  • zs_stat_inc/dec/get() uses enum zs_stat_type for the stat type, however
    some callers pass an enum fullness_group value. Change the type to int to
    reflect the actual use of the functions and get rid of 'enum-conversion'
    warnings

    Link: http://lkml.kernel.org/r/20170731175000.56538-1-mka@chromium.org
    Signed-off-by: Matthias Kaehlcke
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: Doug Anderson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthias Kaehlcke
     
  • Introduce a new migration mode that allow to offload the copy to a device
    DMA engine. This changes the workflow of migration and not all
    address_space migratepage callback can support this.

    This is intended to be use by migrate_vma() which itself is use for thing
    like HMM (see include/linux/hmm.h).

    No additional per-filesystem migratepage testing is needed. I disables
    MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i
    added comment in those to explain why (part of this patch). The commit
    message is unclear it should say that any callback that wish to support
    this new mode need to be aware of the difference in the migration flow
    from other mode.

    Some of these callbacks do extra locking while copying (aio, zsmalloc,
    balloon, ...) and for DMA to be effective you want to copy multiple
    pages in one DMA operations. But in the problematic case you can not
    easily hold the extra lock accross multiple call to this callback.

    Usual flow is:

    For each page {
    1 - lock page
    2 - call migratepage() callback
    3 - (extra locking in some migratepage() callback)
    4 - migrate page state (freeze refcount, update page cache, buffer
    head, ...)
    5 - copy page
    6 - (unlock any extra lock of migratepage() callback)
    7 - return from migratepage() callback
    8 - unlock page
    }

    The new mode MIGRATE_SYNC_NO_COPY:
    1 - lock multiple pages
    For each page {
    2 - call migratepage() callback
    3 - abort in all problematic migratepage() callback
    4 - migrate page state (freeze refcount, update page cache, buffer
    head, ...)
    } // finished all calls to migratepage() callback
    5 - DMA copy multiple pages
    6 - unlock all the pages

    To support MIGRATE_SYNC_NO_COPY in the problematic case we would need a
    new callback migratepages() (for instance) that deals with multiple
    pages in one transaction.

    Because the problematic cases are not important for current usage I did
    not wanted to complexify this patchset even more for no good reason.

    Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Cc: Aneesh Kumar
    Cc: Balbir Singh
    Cc: Benjamin Herrenschmidt
    Cc: Dan Williams
    Cc: David Nellans
    Cc: Evgeny Baskakov
    Cc: Johannes Weiner
    Cc: John Hubbard
    Cc: Kirill A. Shutemov
    Cc: Mark Hairgrove
    Cc: Michal Hocko
    Cc: Paul E. McKenney
    Cc: Ross Zwisler
    Cc: Sherry Cheung
    Cc: Subhash Gutti
    Cc: Vladimir Davydov
    Cc: Bob Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     

07 Sep, 2017

1 commit

  • Getting -EBUSY from zs_page_migrate will make migration slow (retry) or
    fail (zs_page_putback will schedule_work free_work, but it cannot ensure
    the success).

    I noticed this issue because my Kernel patched
    (https://lkml.org/lkml/2014/5/28/113) that will remove retry in
    __alloc_contig_migrate_range.

    This retry will handle the -EBUSY because it will re-isolate the page
    and re-call migrate_pages. Without it will make cma_alloc fail at once
    with -EBUSY.

    According to the review from Minchan Kim in
    https://lkml.org/lkml/2014/5/28/113, I update the patch to skip
    unnecessary loops but not return -EBUSY if zspage is not inuse.

    Following is what I got with highalloc-performance in a vbox with 2 cpu
    1G memory 512 zram as swap. And the swappiness is set to 100.

    ori ne
    orig new
    Minor Faults 50805113 50830235
    Major Faults 43918 56530
    Swap Ins 42087 55680
    Swap Outs 89718 104700
    Allocation stalls 0 0
    DMA allocs 57787 52364
    DMA32 allocs 47964599 48043563
    Normal allocs 0 0
    Movable allocs 0 0
    Direct pages scanned 45493 23167
    Kswapd pages scanned 1565222 1725078
    Kswapd pages reclaimed 1342222 1503037
    Direct pages reclaimed 45615 25186
    Kswapd efficiency 85% 87%
    Kswapd velocity 1897.101 1949.042
    Direct efficiency 100% 108%
    Direct velocity 55.139 26.175
    Percentage direct scans 2% 1%
    Zone normal velocity 1952.240 1975.217
    Zone dma32 velocity 0.000 0.000
    Zone dma velocity 0.000 0.000
    Page writes by reclaim 89764.000 105233.000
    Page writes file 46 533
    Page writes anon 89718 104700
    Page reclaim immediate 21457 3699
    Sector Reads 3259688 3441368
    Sector Writes 3667252 3754836
    Page rescued immediate 0 0
    Slabs scanned 1042872 1160855
    Direct inode steals 8042 10089
    Kswapd inode steals 54295 29170
    Kswapd skipped wait 0 0
    THP fault alloc 175 154
    THP collapse alloc 226 289
    THP splits 0 0
    THP fault fallback 11 14
    THP collapse fail 3 2
    Compaction stalls 536 646
    Compaction success 322 358
    Compaction failures 214 288
    Page migrate success 119608 111063
    Page migrate failure 2723 2593
    Compaction pages isolated 250179 232652
    Compaction migrate scanned 9131832 9942306
    Compaction free scanned 2093272 2613998
    Compaction cost 192 189
    NUMA alloc hit 47124555 47193990
    NUMA alloc miss 0 0
    NUMA interleave hit 0 0
    NUMA alloc local 47124555 47193990
    NUMA base PTE updates 0 0
    NUMA huge PMD updates 0 0
    NUMA page range updates 0 0
    NUMA hint faults 0 0
    NUMA hint local faults 0 0
    NUMA hint local percent 100 100
    NUMA pages migrated 0 0
    AutoNUMA cost 0% 0%

    [akpm@linux-foundation.org: remove newline, per Minchan]
    Link: http://lkml.kernel.org/r/1500889535-19648-1-git-send-email-zhuhui@xiaomi.com
    Signed-off-by: Hui Zhu
    Acked-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hui Zhu
     

03 Aug, 2017

1 commit

  • Mike reported kernel goes oops with ltp:zram03 testcase.

    zram: Added device: zram0
    zram0: detected capacity change from 0 to 107374182400
    BUG: unable to handle kernel paging request at 0000306d61727a77
    IP: zs_map_object+0xb9/0x260
    PGD 0
    P4D 0
    Oops: 0000 [#1] SMP
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in: zram(E) xfs(E) libcrc32c(E) btrfs(E) xor(E) raid6_pq(E) loop(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E) x_tables(E) af_packet(E) br_netfilter(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) intel_powerclamp(E) coretemp(E) cdc_ether(E) kvm_intel(E) usbnet(E) mii(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) iTCO_wdt(E) ghash_clmulni_intel(E) bnx2(E) iTCO_vendor_support(E) pcbc(E) ioatdma(E) ipmi_ssif(E) aesni_intel(E) i5500_temp(E) i2c_i801(E) aes_x86_64(E) lpc_ich(E) shpchp(E) mfd_core(E) crypto_simd(E) i7core_edac(E) dca(E) glue_helper(E) cryptd(E) ipmi_si(E) button(E) acpi_cpufreq(E) ipmi_devintf(E) pcspkr(E) ipmi_msghandler(E)
    nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) mbcache(E) jbd2(E) sd_mod(E) ata_generic(E) i2c_algo_bit(E) ata_piix(E) drm_kms_helper(E) ahci(E) syscopyarea(E) sysfillrect(E) libahci(E) sysimgblt(E) fb_sys_fops(E) uhci_hcd(E) ehci_pci(E) ttm(E) ehci_hcd(E) libata(E) drm(E) megaraid_sas(E) usbcore(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) efivarfs(E) autofs4(E) [last unloaded: zram]
    CPU: 6 PID: 12356 Comm: swapon Tainted: G E 4.13.0.g87b2c3f-default #194
    Hardware name: IBM System x3550 M3 -[7944K3G]-/69Y5698 , BIOS -[D6E150AUS-1.10]- 12/15/2010
    task: ffff880158d2c4c0 task.stack: ffffc90001680000
    RIP: 0010:zs_map_object+0xb9/0x260
    Call Trace:
    zram_bvec_rw.isra.26+0xe8/0x780 [zram]
    zram_rw_page+0x6e/0xa0 [zram]
    bdev_read_page+0x81/0xb0
    do_mpage_readpage+0x51a/0x710
    mpage_readpages+0x122/0x1a0
    blkdev_readpages+0x1d/0x20
    __do_page_cache_readahead+0x1b2/0x270
    ondemand_readahead+0x180/0x2c0
    page_cache_sync_readahead+0x31/0x50
    generic_file_read_iter+0x7e7/0xaf0
    blkdev_read_iter+0x37/0x40
    __vfs_read+0xce/0x140
    vfs_read+0x9e/0x150
    SyS_read+0x46/0xa0
    entry_SYSCALL_64_fastpath+0x1a/0xa5
    Code: 81 e6 00 c0 3f 00 81 fe 00 00 16 00 0f 85 9f 01 00 00 0f b7 13 65 ff 05 5e 07 dc 7e 66 c1 ea 02 81 e2 ff 01 00 00 49 8b 54 d4 08 4a 48 41 0f af ce 81 e1 ff 0f 00 00 41 89 c9 48 c7 c3 a0 70
    RIP: zs_map_object+0xb9/0x260 RSP: ffffc90001683988
    CR2: 0000306d61727a77

    He bisected the problem is [1].

    After commit cf8e0fedf078 ("mm/zsmalloc: simplify zs_max_alloc_size
    handling"), zram doesn't use double pointer for pool->size_class any
    more in zs_create_pool so counter function zs_destroy_pool don't need to
    free it, either.

    Otherwise, it does kfree wrong address and then, kernel goes Oops.

    Link: http://lkml.kernel.org/r/20170725062650.GA12134@bbox
    Fixes: cf8e0fedf078 ("mm/zsmalloc: simplify zs_max_alloc_size handling")
    Signed-off-by: Minchan Kim
    Reported-by: Mike Galbraith
    Tested-by: Mike Galbraith
    Reviewed-by: Sergey Senozhatsky
    Cc: Jerome Marchand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

11 Jul, 2017

2 commits

  • Commit 40f9fb8cffc6 ("mm/zsmalloc: support allocating obj with size of
    ZS_MAX_ALLOC_SIZE") fixes a size calculation error that prevented
    zsmalloc to allocate an object of the maximal size (ZS_MAX_ALLOC_SIZE).
    I think however the fix is unneededly complicated.

    This patch replaces the dynamic calculation of zs_size_classes at init
    time by a compile time calculation that uses the DIV_ROUND_UP() macro
    already used in get_size_class_index().

    [akpm@linux-foundation.org: use min_t]
    Link: http://lkml.kernel.org/r/20170630114859.1979-1-jmarchan@redhat.com
    Signed-off-by: Jerome Marchand
    Acked-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: Mahendran Ganesh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jerome Marchand
     
  • is_first_page() is only called from the macro VM_BUG_ON_PAGE() which is
    only compiled in as a runtime check when CONFIG_DEBUG_VM is set,
    otherwise is checked at compile time and not actually compiled in.

    Fixes the following warning, found with Clang:

    mm/zsmalloc.c:472:12: warning: function 'is_first_page' is not needed and will not be emitted [-Wunneeded-internal-declaration]
    static int is_first_page(struct page *page)
    ^

    Link: http://lkml.kernel.org/r/20170524053859.29059-1-nick.desaulniers@gmail.com
    Signed-off-by: Nick Desaulniers
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Desaulniers
     

14 Apr, 2017

1 commit

  • Now 64K page system, zsamlloc has 257 classes so 8 class bit is not
    enough. With that, it corrupts the system when zsmalloc stores
    65536byte data(ie, index number 256) so that this patch increases class
    bit for simple fix for stable backport. We should clean up this mess
    soon.

    index size
    0 32
    1 288
    ..
    ..
    204 52256
    256 65536

    Fixes: 3783689a1 ("zsmalloc: introduce zspage structure")
    Link: http://lkml.kernel.org/r/1492042622-12074-3-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

02 Mar, 2017

1 commit


25 Feb, 2017

2 commits

  • The class index and fullness group are not encoded in
    (first)page->mapping any more, after commit 3783689a1aa8 ("zsmalloc:
    introduce zspage structure"). Instead, they are store in struct zspage.

    Just delete this unneeded comment.

    Link: http://lkml.kernel.org/r/1486620822-36826-1-git-send-email-xieyisheng1@huawei.com
    Signed-off-by: Yisheng Xie
    Suggested-by: Sergey Senozhatsky
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: Nitin Gupta
    Cc: Hanjun Guo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yisheng Xie
     
  • We had used page->lru to link the component pages (except the first
    page) of a zspage, and used INIT_LIST_HEAD(&page->lru) to init it.
    Therefore, to get the last page's next page, which is NULL, we had to
    use page flag PG_Private_2 to identify it.

    But now, we use page->freelist to link all of the pages in zspage and
    init the page->freelist as NULL for last page, so no need to use
    PG_Private_2 anymore.

    This remove redundant SetPagePrivate2 in create_page_chain and
    ClearPagePrivate2 in reset_page(). Save a few cycles for migration of
    zsmalloc page :)

    Link: http://lkml.kernel.org/r/1487076509-49270-1-git-send-email-xieyisheng1@huawei.com
    Signed-off-by: Yisheng Xie
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yisheng Xie
     

23 Feb, 2017

1 commit

  • Delete extra semicolon, and fix some typos.

    Link: http://lkml.kernel.org/r/586F1823.4050107@huawei.com
    Signed-off-by: Xishi Qiu
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xishi Qiu
     

02 Dec, 2016

1 commit

  • Install the callbacks via the state machine and let the core invoke
    the callbacks on the already online CPUs.

    Signed-off-by: Sebastian Andrzej Siewior
    Cc: Sergey Senozhatsky
    Cc: linux-mm@kvack.org
    Cc: Minchan Kim
    Cc: rt@linutronix.de
    Cc: Nitin Gupta
    Link: http://lkml.kernel.org/r/20161126231350.10321-11-bigeasy@linutronix.de
    Signed-off-by: Thomas Gleixner

    Sebastian Andrzej Siewior
     

29 Jul, 2016

8 commits

  • iput() tests whether its argument is NULL and then returns immediately.
    Thus the test around the call is not needed.

    This issue was detected by using the Coccinelle software.

    Link: http://lkml.kernel.org/r/559cf499-4a01-25f9-c87f-24d906626a57@users.sourceforge.net
    Signed-off-by: Markus Elfring
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Markus Elfring
     
  • Use ClearPagePrivate/ClearPagePrivate2 helpers to clear
    PG_private/PG_private_2 in page->flags

    Link: http://lkml.kernel.org/r/1467882338-4300-7-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Acked-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • Add __init,__exit attribute for function that only called in module
    init/exit to save memory.

    Link: http://lkml.kernel.org/r/1467882338-4300-6-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Cc: Sergey Senozhatsky
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • Some minor commebnt changes:

    1). update zs_malloc(),zs_create_pool() function header
    2). update "Usage of struct page fields"

    Link: http://lkml.kernel.org/r/1467882338-4300-5-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • Currently, if a class can not be merged, the max objects of zspage in
    that class may be calculated twice.

    This patch calculate max objects of zspage at the begin, and pass the
    value to can_merge() to decide whether the class can be merged.

    Also this patch remove function get_maxobj_per_zspage(), as there is no
    other place to call this function.

    Link: http://lkml.kernel.org/r/1467882338-4300-4-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • num of max objects in zspage is stored in each size_class now. So there
    is no need to re-calculate it.

    Link: http://lkml.kernel.org/r/1467882338-4300-3-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Acked-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • the obj index value should be updated after return from
    find_alloced_obj() to avoid CPU burning caused by unnecessary object
    scanning.

    Link: http://lkml.kernel.org/r/1467882338-4300-2-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • This is a cleanup patch. Change "index" to "obj_index" to keep
    consistent with others in zsmalloc.

    Link: http://lkml.kernel.org/r/1467882338-4300-1-git-send-email-opensource.ganesh@gmail.com
    Signed-off-by: Ganesh Mahendran
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     

27 Jul, 2016

9 commits

  • Randy reported below build error.

    > In file included from ../include/linux/balloon_compaction.h:48:0,
    > from ../mm/balloon_compaction.c:11:
    > ../include/linux/compaction.h:237:51: warning: 'struct node' declared inside parameter list [enabled by default]
    > static inline int compaction_register_node(struct node *node)
    > ../include/linux/compaction.h:237:51: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
    > ../include/linux/compaction.h:242:54: warning: 'struct node' declared inside parameter list [enabled by default]
    > static inline void compaction_unregister_node(struct node *node)
    >

    It was caused by non-lru page migration which needs compaction.h but
    compaction.h doesn't include any header to be standalone.

    I think proper header for non-lru page migration is migrate.h rather
    than compaction.h because migrate.h has already headers needed to work
    non-lru page migration indirectly like isolate_mode_t, migrate_mode
    MIGRATEPAGE_SUCCESS.

    [akpm@linux-foundation.org: revert mm-balloon-use-general-non-lru-movable-page-feature-fix.patch temp fix]
    Link: http://lkml.kernel.org/r/20160610003304.GE29779@bbox
    Signed-off-by: Minchan Kim
    Reported-by: Randy Dunlap
    Cc: Konstantin Khlebnikov
    Cc: Vlastimil Babka
    Cc: Gioh Kim
    Cc: Rafael Aquini
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • zram is very popular for some of the embedded world (e.g., TV, mobile
    phones). On those system, zsmalloc's consumed memory size is never
    trivial (one of example from real product system, total memory: 800M,
    zsmalloc consumed: 150M), so we have used this out of tree patch to
    monitor system memory behavior via /proc/vmstat.

    With zsmalloc in vmstat, it helps in tracking down system behavior due
    to memory usage.

    [minchan@kernel.org: zsmalloc: follow up zsmalloc vmstat]
    Link: http://lkml.kernel.org/r/20160607091737.GC23435@bbox
    [akpm@linux-foundation.org: fix build with CONFIG_ZSMALLOC=m]
    Link: http://lkml.kernel.org/r/1464919731-13255-1-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Sangseok Lee
    Cc: Chanho Min
    Cc: Chan Gyun Jeong
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Static check warns using tag as bit shifter. It doesn't break current
    working but not good for redability. Let's use OBJ_TAG_BIT as bit
    shifter instead of OBJ_ALLOCATED_TAG.

    Link: http://lkml.kernel.org/r/20160607045146.GF26230@bbox
    Signed-off-by: Minchan Kim
    Reported-by: Dan Carpenter
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • This patch introduces run-time migration feature for zspage.

    For migration, VM uses page.lru field so it would be better to not use
    page.next field which is unified with page.lru for own purpose. For
    that, firstly, we can get first object offset of the page via runtime
    calculation instead of using page.index so we can use page.index as link
    for page chaining instead of page.next.

    In case of huge object, it stores handle to page.index instead of next
    link of page chaining because huge object doesn't need to next link for
    page chaining. So get_next_page need to identify huge object to return
    NULL. For it, this patch uses PG_owner_priv_1 flag of the page flag.

    For migration, it supports three functions

    * zs_page_isolate

    It isolates a zspage which includes a subpage VM want to migrate from
    class so anyone cannot allocate new object from the zspage.

    We could try to isolate a zspage by the number of subpage so subsequent
    isolation trial of other subpage of the zpsage shouldn't fail. For
    that, we introduce zspage.isolated count. With that, zs_page_isolate
    can know whether zspage is already isolated or not for migration so if
    it is isolated for migration, subsequent isolation trial can be
    successful without trying further isolation.

    * zs_page_migrate

    First of all, it holds write-side zspage->lock to prevent migrate other
    subpage in zspage. Then, lock all objects in the page VM want to
    migrate. The reason we should lock all objects in the page is due to
    race between zs_map_object and zs_page_migrate.

    zs_map_object zs_page_migrate

    pin_tag(handle)
    obj = handle_to_obj(handle)
    obj_to_location(obj, &page, &obj_idx);

    write_lock(&zspage->lock)
    if (!trypin_tag(handle))
    goto unpin_object

    zspage = get_zspage(page);
    read_lock(&zspage->lock);

    If zs_page_migrate doesn't do trypin_tag, zs_map_object's page can be
    stale by migration so it goes crash.

    If it locks all of objects successfully, it copies content from old page
    to new one, finally, create new zspage chain with new page. And if it's
    last isolated subpage in the zspage, put the zspage back to class.

    * zs_page_putback

    It returns isolated zspage to right fullness_group list if it fails to
    migrate a page. If it find a zspage is ZS_EMPTY, it queues zspage
    freeing to workqueue. See below about async zspage freeing.

    This patch introduces asynchronous zspage free. The reason to need it
    is we need page_lock to clear PG_movable but unfortunately, zs_free path
    should be atomic so the apporach is try to grab page_lock. If it got
    page_lock of all of pages successfully, it can free zspage immediately.
    Otherwise, it queues free request and free zspage via workqueue in
    process context.

    If zs_free finds the zspage is isolated when it try to free zspage, it
    delays the freeing until zs_page_putback finds it so it will free free
    the zspage finally.

    In this patch, we expand fullness_list from ZS_EMPTY to ZS_FULL. First
    of all, it will use ZS_EMPTY list for delay freeing. And with adding
    ZS_FULL list, it makes to identify whether zspage is isolated or not via
    list_empty(&zspage->list) test.

    [minchan@kernel.org: zsmalloc: keep first object offset in struct page]
    Link: http://lkml.kernel.org/r/1465788015-23195-1-git-send-email-minchan@kernel.org
    [minchan@kernel.org: zsmalloc: zspage sanity check]
    Link: http://lkml.kernel.org/r/20160603010129.GC3304@bbox
    Link: http://lkml.kernel.org/r/1464736881-24886-12-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Zsmalloc stores first free object's position into freeobj
    in each zspage. If we change it with index from first_page instead of
    position, it makes page migration simple because we don't need to
    correct other entries for linked list if a page is migrated out.

    Link: http://lkml.kernel.org/r/1464736881-24886-11-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Currently, putback_zspage does free zspage under class->lock if fullness
    become ZS_EMPTY but it makes trouble to implement locking scheme for new
    zspage migration. So, this patch is to separate free_zspage from
    putback_zspage and free zspage out of class->lock which is preparation
    for zspage migration.

    Link: http://lkml.kernel.org/r/1464736881-24886-10-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • We have squeezed meta data of zspage into first page's descriptor. So,
    to get meta data from subpage, we should get first page first of all.
    But it makes trouble to implment page migration feature of zsmalloc
    because any place where to get first page from subpage can be raced with
    first page migration. IOW, first page it got could be stale. For
    preventing it, I have tried several approahces but it made code
    complicated so finally, I concluded to separate metadata from first
    page. Of course, it consumes more memory. IOW, 16bytes per zspage on
    32bit at the moment. It means we lost 1% at *worst case*(40B/4096B)
    which is not bad I think at the cost of maintenance.

    Link: http://lkml.kernel.org/r/1464736881-24886-9-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • For page migration, we need to create page chain of zspage dynamically
    so this patch factors it out from alloc_zspage.

    Link: http://lkml.kernel.org/r/1464736881-24886-8-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Upcoming patch will change how to encode zspage meta so for easy review,
    this patch wraps code to access metadata as accessor.

    Link: http://lkml.kernel.org/r/1464736881-24886-7-git-send-email-minchan@kernel.org
    Signed-off-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim