07 Dec, 2020

1 commit

  • While I was doing zram testing, I found sometimes decompression failed
    since the compression buffer was corrupted. With investigation, I found
    below commit calls cond_resched unconditionally so it could make a
    problem in atomic context if the task is reschedule.

    BUG: sleeping function called from invalid context at mm/vmalloc.c:108
    in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 946, name: memhog
    3 locks held by memhog/946:
    #0: ffff9d01d4b193e8 (&mm->mmap_lock#2){++++}-{4:4}, at: __mm_populate+0x103/0x160
    #1: ffffffffa3d53de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0xa98/0x1160
    #2: ffff9d01d56b8110 (&zspage->lock){.+.+}-{3:3}, at: zs_map_object+0x8e/0x1f0
    CPU: 0 PID: 946 Comm: memhog Not tainted 5.9.3-00011-gc5bfc0287345-dirty #316
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
    Call Trace:
    unmap_kernel_range_noflush+0x2eb/0x350
    unmap_kernel_range+0x14/0x30
    zs_unmap_object+0xd5/0xe0
    zram_bvec_rw.isra.0+0x38c/0x8e0
    zram_rw_page+0x90/0x101
    bdev_write_page+0x92/0xe0
    __swap_writepage+0x94/0x4a0
    pageout+0xe3/0x3a0
    shrink_page_list+0xb94/0xd60
    shrink_inactive_list+0x158/0x460

    We can fix this by removing the ZSMALLOC_PGTABLE_MAPPING feature (which
    contains the offending calling code) from zsmalloc.

    Even though this option showed some amount improvement(e.g., 30%) in
    some arm32 platforms, it has been headache to maintain since it have
    abused APIs[1](e.g., unmap_kernel_range in atomic context).

    Since we are approaching to deprecate 32bit machines and already made
    the config option available for only builtin build since v5.8, lastly it
    has been not default option in zsmalloc, it's time to drop the option
    for better maintenance.

    [1] http://lore.kernel.org/linux-mm/20201105170249.387069-1-minchan@kernel.org

    Fixes: e47110e90584 ("mm/vunmap: add cond_resched() in vunmap_pmd_range")
    Signed-off-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Reviewed-by: Sergey Senozhatsky
    Cc: Tony Lindgren
    Cc: Christoph Hellwig
    Cc: Harish Sriram
    Cc: Uladzislau Rezki
    Cc:
    Link: https://lkml.kernel.org/r/20201117202916.GA3856507@google.com
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

19 Oct, 2020

1 commit

  • Just manually pre-fault the PTEs using apply_to_page_range.

    Co-developed-by: Minchan Kim
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Cc: Boris Ostrovsky
    Cc: Chris Wilson
    Cc: Jani Nikula
    Cc: Joonas Lahtinen
    Cc: Juergen Gross
    Cc: Matthew Auld
    Cc: "Matthew Wilcox (Oracle)"
    Cc: Nitin Gupta
    Cc: Peter Zijlstra
    Cc: Rodrigo Vivi
    Cc: Stefano Stabellini
    Cc: Tvrtko Ursulin
    Cc: Uladzislau Rezki (Sony)
    Link: https://lkml.kernel.org/r/20201002122204.1534411-6-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

13 Aug, 2020

1 commit

  • Change "as as" to "as a".

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Reviewed-by: Andrew Morton
    Reviewed-by: Zi Yan
    Link: http://lkml.kernel.org/r/20200801173822.14973-16-rdunlap@infradead.org
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

10 Jun, 2020

2 commits

  • The replacement of with made the include
    of the latter in the middle of asm includes. Fix this up with the aid of
    the below script and manual adjustments here and there.

    import sys
    import re

    if len(sys.argv) is not 3:
    print "USAGE: %s " % (sys.argv[0])
    sys.exit(1)

    hdr_to_move="#include " % sys.argv[2]
    moved = False
    in_hdrs = False

    with open(sys.argv[1], "r") as f:
    lines = f.readlines()
    for _line in lines:
    line = _line.rstrip('
    ')
    if line == hdr_to_move:
    continue
    if line.startswith("#include
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-4-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • The include/linux/pgtable.h is going to be the home of generic page table
    manipulation functions.

    Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and
    make the latter include asm/pgtable.h.

    Signed-off-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brian Cain
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Ungerer
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: Ingo Molnar
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Matthew Wilcox
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Nick Hu
    Cc: Paul Walmsley
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Thomas Bogendoerfer
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vincent Chen
    Cc: Vineet Gupta
    Cc: Will Deacon
    Cc: Yoshinori Sato
    Link: http://lkml.kernel.org/r/20200514170327.31389-3-rppt@kernel.org
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

03 Jun, 2020

2 commits

  • Switch all callers to map_kernel_range, which symmetric to the unmap side
    (as well as the _noflush versions).

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Acked-by: Peter Zijlstra (Intel)
    Cc: Christian Borntraeger
    Cc: Christophe Leroy
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Gao Xiang
    Cc: Greg Kroah-Hartman
    Cc: Haiyang Zhang
    Cc: Johannes Weiner
    Cc: "K. Y. Srinivasan"
    Cc: Laura Abbott
    Cc: Mark Rutland
    Cc: Michael Kelley
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Cc: Robin Murphy
    Cc: Sakari Ailus
    Cc: Stephen Hemminger
    Cc: Sumit Semwal
    Cc: Wei Liu
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Heiko Carstens
    Cc: Paul Mackerras
    Cc: Vasily Gorbik
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/20200414131348.444715-17-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Rename the Kconfig variable to clarify the scope.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Acked-by: Minchan Kim
    Acked-by: Peter Zijlstra (Intel)
    Cc: Christian Borntraeger
    Cc: Christophe Leroy
    Cc: Daniel Vetter
    Cc: David Airlie
    Cc: Gao Xiang
    Cc: Greg Kroah-Hartman
    Cc: Haiyang Zhang
    Cc: Johannes Weiner
    Cc: "K. Y. Srinivasan"
    Cc: Laura Abbott
    Cc: Mark Rutland
    Cc: Michael Kelley
    Cc: Nitin Gupta
    Cc: Robin Murphy
    Cc: Sakari Ailus
    Cc: Stephen Hemminger
    Cc: Sumit Semwal
    Cc: Wei Liu
    Cc: Benjamin Herrenschmidt
    Cc: Catalin Marinas
    Cc: Heiko Carstens
    Cc: Paul Mackerras
    Cc: Vasily Gorbik
    Cc: Will Deacon
    Link: http://lkml.kernel.org/r/20200414131348.444715-11-hch@lst.de
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

08 Apr, 2020

5 commits

  • Convert the various /* fallthrough */ comments to the pseudo-keyword
    fallthrough;

    Done via script:
    https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/

    Signed-off-by: Joe Perches
    Signed-off-by: Andrew Morton
    Reviewed-by: Gustavo A. R. Silva
    Link: http://lkml.kernel.org/r/f62fea5d10eb0ccfc05d87c242a620c261219b66.camel@perches.com
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Sparse reports a warning at unpin_tag()()

    warning: context imbalance in unpin_tag() - unexpected unlock

    The root cause is the missing annotation at unpin_tag()
    Add the missing __releases(bitlock) annotation

    Signed-off-by: Jules Irenge
    Signed-off-by: Andrew Morton
    Acked-by: Minchan Kim
    Link: http://lkml.kernel.org/r/20200214204741.94112-14-jbi.octave@gmail.com
    Signed-off-by: Linus Torvalds

    Jules Irenge
     
  • Sparse reports a warning at pin_tag()()

    warning: context imbalance in pin_tag() - wrong count at exit

    The root cause is the missing annotation at pin_tag()
    Add the missing __acquires(bitlock) annotation

    Signed-off-by: Jules Irenge
    Signed-off-by: Andrew Morton
    Acked-by: Minchan Kim
    Link: http://lkml.kernel.org/r/20200214204741.94112-13-jbi.octave@gmail.com
    Signed-off-by: Linus Torvalds

    Jules Irenge
     
  • Sparse reports a warning at migrate_read_unlock()()

    warning: context imbalance in migrate_read_unlock() - unexpected unlock

    The root cause is the missing annotation at migrate_read_unlock()
    Add the missing __releases(&zspage->lock) annotation

    Signed-off-by: Jules Irenge
    Signed-off-by: Andrew Morton
    Acked-by: Minchan Kim
    Link: http://lkml.kernel.org/r/20200214204741.94112-12-jbi.octave@gmail.com
    Signed-off-by: Linus Torvalds

    Jules Irenge
     
  • Sparse reports a warning at migrate_read_lock()()

    warning: context imbalance in migrate_read_lock() - wrong count at exit

    The root cause is the missing annotation at migrate_read_lock()
    Add the missing __acquires(&zspage->lock) annotation

    Signed-off-by: Jules Irenge
    Signed-off-by: Andrew Morton
    Acked-by: Minchan Kim
    Link: http://lkml.kernel.org/r/20200214204741.94112-11-jbi.octave@gmail.com
    Signed-off-by: Linus Torvalds

    Jules Irenge
     

05 Jan, 2020

1 commit

  • When zspage is migrated to the other zone, the zone page state should be
    updated as well, otherwise the NR_ZSPAGE for each zone shows wrong
    counts including proc/zoneinfo in practice.

    Link: http://lkml.kernel.org/r/1575434841-48009-1-git-send-email-chanho.min@lge.com
    Fixes: 91537fee0013 ("mm: add NR_ZSMALLOC to vmstat")
    Signed-off-by: Chanho Min
    Signed-off-by: Jinsuk Choi
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: [4.9+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chanho Min
     

25 Sep, 2019

2 commits

  • set_zspage_inuse() was introduced in the commit 4f42047bbde0 ("zsmalloc:
    use accessor") but all the users of it were removed later by the commits,

    bdb0af7ca8f0 ("zsmalloc: factor page chain functionality out")
    3783689a1aa8 ("zsmalloc: introduce zspage structure")

    so the function can be safely removed now.

    Link: http://lkml.kernel.org/r/1568658408-19374-1-git-send-email-cai@lca.pw
    Signed-off-by: Qian Cai
    Reviewed-by: Andrew Morton
    Cc: Minchan Kim
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qian Cai
     
  • As a zpool_driver, zsmalloc can allocate movable memory because it support
    migate pages. But zbud and z3fold cannot allocate movable memory.

    Add malloc_support_movable to zpool_driver. If a zpool_driver support
    allocate movable memory, set it to true. And add
    zpool_malloc_support_movable check malloc_support_movable to make sure if
    a zpool support allocate movable memory.

    Link: http://lkml.kernel.org/r/20190605100630.13293-1-teawaterz@linux.alibaba.com
    Signed-off-by: Hui Zhu
    Reviewed-by: Shakeel Butt
    Cc: Dan Streetman
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Cc: Sergey Senozhatsky
    Cc: Seth Jennings
    Cc: Vitaly Wool
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hui Zhu
     

31 Aug, 2019

1 commit

  • Fixes: 701d678599d0c1 ("mm/zsmalloc.c: fix race condition in zs_destroy_pool")
    Link: http://lkml.kernel.org/r/201908251039.5oSbEEUT%25lkp@intel.com
    Reported-by: kbuild test robot
    Cc: Sergey Senozhatsky
    Cc: Henry Burns
    Cc: Minchan Kim
    Cc: Shakeel Butt
    Cc: Jonathan Adams
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

25 Aug, 2019

2 commits

  • In zs_destroy_pool() we call flush_work(&pool->free_work). However, we
    have no guarantee that migration isn't happening in the background at
    that time.

    Since migration can't directly free pages, it relies on free_work being
    scheduled to free the pages. But there's nothing preventing an
    in-progress migrate from queuing the work *after*
    zs_unregister_migration() has called flush_work(). Which would mean
    pages still pointing at the inode when we free it.

    Since we know at destroy time all objects should be free, no new
    migrations can come in (since zs_page_isolate() fails for fully-free
    zspages). This means it is sufficient to track a "# isolated zspages"
    count by class, and have the destroy logic ensure all such pages have
    drained before proceeding. Keeping that state under the class spinlock
    keeps the logic straightforward.

    In this case a memory leak could lead to an eventual crash if compaction
    hits the leaked page. This crash would only occur if people are
    changing their zswap backend at runtime (which eventually starts
    destruction).

    Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com
    Fixes: 48b4800a1c6a ("zsmalloc: page migration support")
    Signed-off-by: Henry Burns
    Reviewed-by: Sergey Senozhatsky
    Cc: Henry Burns
    Cc: Minchan Kim
    Cc: Shakeel Butt
    Cc: Jonathan Adams
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henry Burns
     
  • In zs_page_migrate() we call putback_zspage() after we have finished
    migrating all pages in this zspage. However, the return value is
    ignored. If a zs_free() races in between zs_page_isolate() and
    zs_page_migrate(), freeing the last object in the zspage,
    putback_zspage() will leave the page in ZS_EMPTY for potentially an
    unbounded amount of time.

    To fix this, we need to do the same thing as zs_page_putback() does:
    schedule free_work to occur.

    To avoid duplicated code, move the sequence to a new
    putback_zspage_deferred() function which both zs_page_migrate() and
    zs_page_putback() call.

    Link: http://lkml.kernel.org/r/20190809181751.219326-1-henryburns@google.com
    Fixes: 48b4800a1c6a ("zsmalloc: page migration support")
    Signed-off-by: Henry Burns
    Reviewed-by: Sergey Senozhatsky
    Cc: Henry Burns
    Cc: Minchan Kim
    Cc: Shakeel Butt
    Cc: Jonathan Adams
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henry Burns
     

20 Jul, 2019

1 commit

  • Pull vfs mount updates from Al Viro:
    "The first part of mount updates.

    Convert filesystems to use the new mount API"

    * 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    mnt_init(): call shmem_init() unconditionally
    constify ksys_mount() string arguments
    don't bother with registering rootfs
    init_rootfs(): don't bother with init_ramfs_fs()
    vfs: Convert smackfs to use the new mount API
    vfs: Convert selinuxfs to use the new mount API
    vfs: Convert securityfs to use the new mount API
    vfs: Convert apparmorfs to use the new mount API
    vfs: Convert openpromfs to use the new mount API
    vfs: Convert xenfs to use the new mount API
    vfs: Convert gadgetfs to use the new mount API
    vfs: Convert oprofilefs to use the new mount API
    vfs: Convert ibmasmfs to use the new mount API
    vfs: Convert qib_fs/ipathfs to use the new mount API
    vfs: Convert efivarfs to use the new mount API
    vfs: Convert configfs to use the new mount API
    vfs: Convert binfmt_misc to use the new mount API
    convenience helper: get_tree_single()
    convenience helper get_tree_nodev()
    vfs: Kill sget_userns()
    ...

    Linus Torvalds
     

05 Jun, 2019

1 commit

  • The variable 'entry' is no longer used and the compiler rightly complains
    that it should be removed.

    ../mm/zsmalloc.c: In function `zs_pool_stat_create':
    ../mm/zsmalloc.c:648:17: warning: unused variable `entry' [-Wunused-variable]
    struct dentry *entry;
    ^~~~~

    Rework to remove the unused variable.

    Link: http://lkml.kernel.org/r/20190604065826.26064-1-anders.roxell@linaro.org
    Fixes: 4268509a36a7 ("zsmalloc: no need to check return value of debugfs_create functions")
    Signed-off-by: Anders Roxell
    Cc: Minchan Kim
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Anders Roxell
     

03 Jun, 2019

1 commit


26 May, 2019

2 commits

  • Convert the zsmalloc filesystem to the new internal mount API as the old
    one will be obsoleted and removed. This allows greater flexibility in
    communication of mount parameters between userspace, the VFS and the
    filesystem.

    See Documentation/filesystems/mount_api.txt for more information.

    Signed-off-by: David Howells
    cc: Minchan Kim
    cc: Nitin Gupta
    cc: Sergey Senozhatsky
    cc: linux-mm@kvack.org
    Signed-off-by: Al Viro

    David Howells
     
  • Once upon a time we used to set ->d_name of e.g. pipefs root
    so that d_path() on pipes would work. These days it's
    completely pointless - dentries of pipes are not even connected
    to pipefs root. However, mount_pseudo() had set the root
    dentry name (passed as the second argument) and callers
    kept inventing names to pass to it. Including those that
    didn't *have* any non-root dentries to start with...

    All of that had been pointless for about 8 years now; it's
    time to get rid of that cargo-culting...

    Signed-off-by: Al Viro

    Al Viro
     

20 May, 2019

1 commit


27 Oct, 2018

1 commit

  • Replace "fallthru" with a proper "fall through" annotation.

    This fix is part of the ongoing efforts to enabling
    -Wimplicit-fallthrough

    Link: http://lkml.kernel.org/r/20181003105114.GA24423@embeddedor.com
    Signed-off-by: Gustavo A. R. Silva
    Reviewed-by: Sergey Senozhatsky
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gustavo A. R. Silva
     

18 Aug, 2018

1 commit

  • The functions zs_page_isolate, zs_page_migrate, zs_page_putback,
    lock_zspage, trylock_zspage and structure zsmalloc_aops are local to
    source and do not need to be in global scope, so make them static.

    Cleans up sparse warnings:
    symbol 'zs_page_isolate' was not declared. Should it be static?
    symbol 'zs_page_migrate' was not declared. Should it be static?
    symbol 'zs_page_putback' was not declared. Should it be static?
    symbol 'zsmalloc_aops' was not declared. Should it be static?
    symbol 'lock_zspage' was not declared. Should it be static?
    symbol 'trylock_zspage' was not declared. Should it be static?

    [arnd@arndb.de: hide unused lock_zspage]
    Link: http://lkml.kernel.org/r/20180706130924.3891230-1-arnd@arndb.de
    Link: http://lkml.kernel.org/r/20180624213322.13776-1-colin.king@canonical.com
    Signed-off-by: Colin Ian King
    Reviewed-by: Sergey Senozhatsky
    Reviewed-by: Andrew Morton
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Colin Ian King
     

15 Jun, 2018

1 commit

  • mm/*.c files use symbolic and octal styles for permissions.

    Using octal and not symbolic permissions is preferred by many as more
    readable.

    https://lkml.org/lkml/2016/8/2/1945

    Prefer the direct use of octal for permissions.

    Done using
    $ scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace mm/*.c
    and some typing.

    Before: $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
    44
    After: $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
    86

    Miscellanea:

    o Whitespace neatening around these conversions.

    Link: http://lkml.kernel.org/r/2e032ef111eebcd4c5952bae86763b541d373469.1522102887.git.joe@perches.com
    Signed-off-by: Joe Perches
    Acked-by: David Rientjes
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

06 Apr, 2018

3 commits

  • Link: http://lkml.kernel.org/r/1519585191-10180-4-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Patch series "zsmalloc/zram: drop zram's max_zpage_size", v3.

    ZRAM's max_zpage_size is a bad thing. It forces zsmalloc to store
    normal objects as huge ones, which results in bigger zsmalloc memory
    usage. Drop it and use actual zsmalloc huge-class value when decide if
    the object is huge or not.

    This patch (of 2):

    Not every object can be share its zspage with other objects, e.g. when
    the object is as big as zspage or nearly as big a zspage. For such
    objects zsmalloc has a so called huge class - every object which belongs
    to huge class consumes the entire zspage (which consists of a physical
    page). On x86_64, PAGE_SHIFT 12 box, the first non-huge class size is
    3264, so starting down from size 3264, objects can share page(-s) and
    thus minimize memory wastage.

    ZRAM, however, has its own statically defined watermark for huge
    objects, namely "3 * PAGE_SIZE / 4 = 3072", and forcibly stores every
    object larger than this watermark (3072) as a PAGE_SIZE object, in other
    words, to a huge class, while zsmalloc can keep some of those objects in
    non-huge classes. This results in increased memory consumption.

    zsmalloc knows better if the object is huge or not. Introduce
    zs_huge_class_size() function which tells if the given object can be
    stored in one of non-huge classes or not. This will let us to drop
    ZRAM's huge object watermark and fully rely on zsmalloc when we decide
    if the object is huge.

    [sergey.senozhatsky.work@gmail.com: add pool param to zs_huge_class_size()]
    Link: http://lkml.kernel.org/r/20180314081833.1096-2-sergey.senozhatsky@gmail.com
    Link: http://lkml.kernel.org/r/20180306070639.7389-2-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • ...instead of open coding file operations followed by custom ->open()
    callbacks per each attribute.

    [andriy.shevchenko@linux.intel.com: add tags, fix compilation issue]
    Link: http://lkml.kernel.org/r/20180217144253.58604-1-andriy.shevchenko@linux.intel.com
    Link: http://lkml.kernel.org/r/20180214154644.54505-1-andriy.shevchenko@linux.intel.com
    Signed-off-by: Andy Shevchenko
    Reviewed-by: Matthew Wilcox
    Reviewed-by: Andrew Morton
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Christoph Lameter
    Cc: Tejun Heo
    Cc: Dennis Zhou
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     

14 Feb, 2018

1 commit

  • With boot-time switching between paging mode we will have variable
    MAX_PHYSMEM_BITS.

    Let's use the maximum variable possible for CONFIG_X86_5LEVEL=y
    configuration to define zsmalloc data structures.

    The patch introduces MAX_POSSIBLE_PHYSMEM_BITS to cover such case.
    It also suits well to handle PAE special case.

    Signed-off-by: Kirill A. Shutemov
    Reviewed-by: Nitin Gupta
    Acked-by: Minchan Kim
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Sergey Senozhatsky
    Cc: Thomas Gleixner
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/20180214111656.88514-3-kirill.shutemov@linux.intel.com
    Signed-off-by: Ingo Molnar

    Kirill A. Shutemov
     

01 Feb, 2018

3 commits

  • Fix warning about shifting unsigned literals being undefined behavior.

    Link: http://lkml.kernel.org/r/1515642078-4259-1-git-send-email-nick.desaulniers@gmail.com
    Signed-off-by: Nick Desaulniers
    Suggested-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Cc: Andy Shevchenko
    Cc: Matthew Wilcox
    Cc: Nick Desaulniers
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Desaulniers
     
  • We waste sizeof(swp_entry_t) for zswap header when using zsmalloc as
    zpool driver because zsmalloc doesn't support eviction.

    Add zpool_evictable() to detect if zpool is potentially evictable, and
    use it in zswap to avoid waste memory for zswap header.

    [yuzhao@google.com: The zpool->" prefix is a result of copy & paste]
    Link: http://lkml.kernel.org/r/20180110225626.110330-1-yuzhao@google.com
    Link: http://lkml.kernel.org/r/20180110224741.83751-1-yuzhao@google.com
    Signed-off-by: Yu Zhao
    Acked-by: Dan Streetman
    Reviewed-by: Sergey Senozhatsky
    Cc: Seth Jennings
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yu Zhao
     
  • Structure zs_pool has special flag to indicate success of shrinker
    initialization. unregister_shrinker() has improved and can detect by
    itself whether actual deinitialization should be performed or not, so
    extra flag becomes redundant.

    [akpm@linux-foundation.org: update comment (Aliaksei), remove unneeded cast]
    Link: http://lkml.kernel.org/r/1513680552-9798-1-git-send-email-akaraliou.dev@gmail.com
    Signed-off-by: Aliaksei Karaliou
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aliaksei Karaliou
     

05 Jan, 2018

1 commit

  • `struct file_system_type' and alloc_anon_inode() function are defined in
    fs.h, include it directly.

    Link: http://lkml.kernel.org/r/20171219104219.3017-1-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     

16 Nov, 2017

1 commit

  • Use BUG_ON(in_interrupt()) in zs_map_object(). This is not a new
    BUG_ON(), it's always been there, but was recently changed to
    VM_BUG_ON(). There are several problems there. First, we use use
    per-CPU mappings both in zsmalloc and in zram, and interrupt may easily
    corrupt those buffers. Second, and more importantly, we believe it's
    possible to start leaking sensitive information. Consider the following
    case:

    -> process P
    swap out
    zram
    per-cpu mapping CPU1
    compress page A
    -> IRQ

    swap out
    zram
    per-cpu mapping CPU1
    compress page B
    write page from per-cpu mapping CPU1 to zsmalloc pool
    iret

    -> process P
    write page from per-cpu mapping CPU1 to zsmalloc pool [*]
    return

    * so we store overwritten data that actually belongs to another
    page (task) and potentially contains sensitive data. And when
    process P will page fault it's going to read (swap in) that
    other task's data.

    Link: http://lkml.kernel.org/r/20170929045140.4055-1-sergey.senozhatsky@gmail.com
    Signed-off-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     

09 Sep, 2017

2 commits

  • zs_stat_inc/dec/get() uses enum zs_stat_type for the stat type, however
    some callers pass an enum fullness_group value. Change the type to int to
    reflect the actual use of the functions and get rid of 'enum-conversion'
    warnings

    Link: http://lkml.kernel.org/r/20170731175000.56538-1-mka@chromium.org
    Signed-off-by: Matthias Kaehlcke
    Reviewed-by: Sergey Senozhatsky
    Acked-by: Minchan Kim
    Cc: Doug Anderson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthias Kaehlcke
     
  • Introduce a new migration mode that allow to offload the copy to a device
    DMA engine. This changes the workflow of migration and not all
    address_space migratepage callback can support this.

    This is intended to be use by migrate_vma() which itself is use for thing
    like HMM (see include/linux/hmm.h).

    No additional per-filesystem migratepage testing is needed. I disables
    MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i
    added comment in those to explain why (part of this patch). The commit
    message is unclear it should say that any callback that wish to support
    this new mode need to be aware of the difference in the migration flow
    from other mode.

    Some of these callbacks do extra locking while copying (aio, zsmalloc,
    balloon, ...) and for DMA to be effective you want to copy multiple
    pages in one DMA operations. But in the problematic case you can not
    easily hold the extra lock accross multiple call to this callback.

    Usual flow is:

    For each page {
    1 - lock page
    2 - call migratepage() callback
    3 - (extra locking in some migratepage() callback)
    4 - migrate page state (freeze refcount, update page cache, buffer
    head, ...)
    5 - copy page
    6 - (unlock any extra lock of migratepage() callback)
    7 - return from migratepage() callback
    8 - unlock page
    }

    The new mode MIGRATE_SYNC_NO_COPY:
    1 - lock multiple pages
    For each page {
    2 - call migratepage() callback
    3 - abort in all problematic migratepage() callback
    4 - migrate page state (freeze refcount, update page cache, buffer
    head, ...)
    } // finished all calls to migratepage() callback
    5 - DMA copy multiple pages
    6 - unlock all the pages

    To support MIGRATE_SYNC_NO_COPY in the problematic case we would need a
    new callback migratepages() (for instance) that deals with multiple
    pages in one transaction.

    Because the problematic cases are not important for current usage I did
    not wanted to complexify this patchset even more for no good reason.

    Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.com
    Signed-off-by: Jérôme Glisse
    Cc: Aneesh Kumar
    Cc: Balbir Singh
    Cc: Benjamin Herrenschmidt
    Cc: Dan Williams
    Cc: David Nellans
    Cc: Evgeny Baskakov
    Cc: Johannes Weiner
    Cc: John Hubbard
    Cc: Kirill A. Shutemov
    Cc: Mark Hairgrove
    Cc: Michal Hocko
    Cc: Paul E. McKenney
    Cc: Ross Zwisler
    Cc: Sherry Cheung
    Cc: Subhash Gutti
    Cc: Vladimir Davydov
    Cc: Bob Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jérôme Glisse
     

07 Sep, 2017

1 commit

  • Getting -EBUSY from zs_page_migrate will make migration slow (retry) or
    fail (zs_page_putback will schedule_work free_work, but it cannot ensure
    the success).

    I noticed this issue because my Kernel patched
    (https://lkml.org/lkml/2014/5/28/113) that will remove retry in
    __alloc_contig_migrate_range.

    This retry will handle the -EBUSY because it will re-isolate the page
    and re-call migrate_pages. Without it will make cma_alloc fail at once
    with -EBUSY.

    According to the review from Minchan Kim in
    https://lkml.org/lkml/2014/5/28/113, I update the patch to skip
    unnecessary loops but not return -EBUSY if zspage is not inuse.

    Following is what I got with highalloc-performance in a vbox with 2 cpu
    1G memory 512 zram as swap. And the swappiness is set to 100.

    ori ne
    orig new
    Minor Faults 50805113 50830235
    Major Faults 43918 56530
    Swap Ins 42087 55680
    Swap Outs 89718 104700
    Allocation stalls 0 0
    DMA allocs 57787 52364
    DMA32 allocs 47964599 48043563
    Normal allocs 0 0
    Movable allocs 0 0
    Direct pages scanned 45493 23167
    Kswapd pages scanned 1565222 1725078
    Kswapd pages reclaimed 1342222 1503037
    Direct pages reclaimed 45615 25186
    Kswapd efficiency 85% 87%
    Kswapd velocity 1897.101 1949.042
    Direct efficiency 100% 108%
    Direct velocity 55.139 26.175
    Percentage direct scans 2% 1%
    Zone normal velocity 1952.240 1975.217
    Zone dma32 velocity 0.000 0.000
    Zone dma velocity 0.000 0.000
    Page writes by reclaim 89764.000 105233.000
    Page writes file 46 533
    Page writes anon 89718 104700
    Page reclaim immediate 21457 3699
    Sector Reads 3259688 3441368
    Sector Writes 3667252 3754836
    Page rescued immediate 0 0
    Slabs scanned 1042872 1160855
    Direct inode steals 8042 10089
    Kswapd inode steals 54295 29170
    Kswapd skipped wait 0 0
    THP fault alloc 175 154
    THP collapse alloc 226 289
    THP splits 0 0
    THP fault fallback 11 14
    THP collapse fail 3 2
    Compaction stalls 536 646
    Compaction success 322 358
    Compaction failures 214 288
    Page migrate success 119608 111063
    Page migrate failure 2723 2593
    Compaction pages isolated 250179 232652
    Compaction migrate scanned 9131832 9942306
    Compaction free scanned 2093272 2613998
    Compaction cost 192 189
    NUMA alloc hit 47124555 47193990
    NUMA alloc miss 0 0
    NUMA interleave hit 0 0
    NUMA alloc local 47124555 47193990
    NUMA base PTE updates 0 0
    NUMA huge PMD updates 0 0
    NUMA page range updates 0 0
    NUMA hint faults 0 0
    NUMA hint local faults 0 0
    NUMA hint local percent 100 100
    NUMA pages migrated 0 0
    AutoNUMA cost 0% 0%

    [akpm@linux-foundation.org: remove newline, per Minchan]
    Link: http://lkml.kernel.org/r/1500889535-19648-1-git-send-email-zhuhui@xiaomi.com
    Signed-off-by: Hui Zhu
    Acked-by: Minchan Kim
    Reviewed-by: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hui Zhu
     

03 Aug, 2017

1 commit

  • Mike reported kernel goes oops with ltp:zram03 testcase.

    zram: Added device: zram0
    zram0: detected capacity change from 0 to 107374182400
    BUG: unable to handle kernel paging request at 0000306d61727a77
    IP: zs_map_object+0xb9/0x260
    PGD 0
    P4D 0
    Oops: 0000 [#1] SMP
    Dumping ftrace buffer:
    (ftrace buffer empty)
    Modules linked in: zram(E) xfs(E) libcrc32c(E) btrfs(E) xor(E) raid6_pq(E) loop(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E) x_tables(E) af_packet(E) br_netfilter(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) intel_powerclamp(E) coretemp(E) cdc_ether(E) kvm_intel(E) usbnet(E) mii(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) iTCO_wdt(E) ghash_clmulni_intel(E) bnx2(E) iTCO_vendor_support(E) pcbc(E) ioatdma(E) ipmi_ssif(E) aesni_intel(E) i5500_temp(E) i2c_i801(E) aes_x86_64(E) lpc_ich(E) shpchp(E) mfd_core(E) crypto_simd(E) i7core_edac(E) dca(E) glue_helper(E) cryptd(E) ipmi_si(E) button(E) acpi_cpufreq(E) ipmi_devintf(E) pcspkr(E) ipmi_msghandler(E)
    nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) mbcache(E) jbd2(E) sd_mod(E) ata_generic(E) i2c_algo_bit(E) ata_piix(E) drm_kms_helper(E) ahci(E) syscopyarea(E) sysfillrect(E) libahci(E) sysimgblt(E) fb_sys_fops(E) uhci_hcd(E) ehci_pci(E) ttm(E) ehci_hcd(E) libata(E) drm(E) megaraid_sas(E) usbcore(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) efivarfs(E) autofs4(E) [last unloaded: zram]
    CPU: 6 PID: 12356 Comm: swapon Tainted: G E 4.13.0.g87b2c3f-default #194
    Hardware name: IBM System x3550 M3 -[7944K3G]-/69Y5698 , BIOS -[D6E150AUS-1.10]- 12/15/2010
    task: ffff880158d2c4c0 task.stack: ffffc90001680000
    RIP: 0010:zs_map_object+0xb9/0x260
    Call Trace:
    zram_bvec_rw.isra.26+0xe8/0x780 [zram]
    zram_rw_page+0x6e/0xa0 [zram]
    bdev_read_page+0x81/0xb0
    do_mpage_readpage+0x51a/0x710
    mpage_readpages+0x122/0x1a0
    blkdev_readpages+0x1d/0x20
    __do_page_cache_readahead+0x1b2/0x270
    ondemand_readahead+0x180/0x2c0
    page_cache_sync_readahead+0x31/0x50
    generic_file_read_iter+0x7e7/0xaf0
    blkdev_read_iter+0x37/0x40
    __vfs_read+0xce/0x140
    vfs_read+0x9e/0x150
    SyS_read+0x46/0xa0
    entry_SYSCALL_64_fastpath+0x1a/0xa5
    Code: 81 e6 00 c0 3f 00 81 fe 00 00 16 00 0f 85 9f 01 00 00 0f b7 13 65 ff 05 5e 07 dc 7e 66 c1 ea 02 81 e2 ff 01 00 00 49 8b 54 d4 08 4a 48 41 0f af ce 81 e1 ff 0f 00 00 41 89 c9 48 c7 c3 a0 70
    RIP: zs_map_object+0xb9/0x260 RSP: ffffc90001683988
    CR2: 0000306d61727a77

    He bisected the problem is [1].

    After commit cf8e0fedf078 ("mm/zsmalloc: simplify zs_max_alloc_size
    handling"), zram doesn't use double pointer for pool->size_class any
    more in zs_create_pool so counter function zs_destroy_pool don't need to
    free it, either.

    Otherwise, it does kfree wrong address and then, kernel goes Oops.

    Link: http://lkml.kernel.org/r/20170725062650.GA12134@bbox
    Fixes: cf8e0fedf078 ("mm/zsmalloc: simplify zs_max_alloc_size handling")
    Signed-off-by: Minchan Kim
    Reported-by: Mike Galbraith
    Tested-by: Mike Galbraith
    Reviewed-by: Sergey Senozhatsky
    Cc: Jerome Marchand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim