08 Apr, 2020

1 commit

  • The compressed cache for swap pages (zswap) currently needs from 1 to 3
    extra kernel command line parameters in order to make it work: it has to
    be enabled by adding a "zswap.enabled=1" command line parameter and if one
    wants a different compressor or pool allocator than the default lzo / zbud
    combination then these choices also need to be specified on the kernel
    command line in additional parameters.

    Using a different compressor and allocator for zswap is actually pretty
    common as guides often recommend using the lz4 / z3fold pair instead of
    the default one. In such case it is also necessary to remember to enable
    the appropriate compression algorithm and pool allocator in the kernel
    config manually.

    Let's avoid the need for adding these kernel command line parameters and
    automatically pull in the dependencies for the selected compressor
    algorithm and pool allocator by adding an appropriate default switches to
    Kconfig.

    The default values for these options match what the code was using
    previously as its defaults.

    Signed-off-by: Maciej S. Szmigiero
    Signed-off-by: Andrew Morton
    Reviewed-by: Vitaly Wool
    Link: http://lkml.kernel.org/r/20200202000112.456103-1-mail@maciej.szmigiero.name
    Signed-off-by: Linus Torvalds

    Maciej S. Szmigiero
     

01 Feb, 2020

2 commits

  • The "pool" pointer can be NULL at the end of the init_zswap(). (We
    would allocate a new pool later in that situation)

    So in the error handling then we need to make sure pool is a valid
    pointer before calling "zswap_pool_destroy(pool);" because that function
    dereferences the argument.

    Link: http://lkml.kernel.org/r/20200114050902.og32fkllkod5ycf5@kili.mountain
    Fixes: 93d4dfa9fbd0 ("mm/zswap.c: add allocation hysteresis if pool limit is hit")
    Signed-off-by: Dan Carpenter
    Cc: Vitaly Wool
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • zswap will always try to shrink pool when zswap is full. If there is a
    high pressure on zswap it will result in flipping pages in and out zswap
    pool without any real benefit, and the overall system performance will
    drop. The previous discussion on this subject [1] ended up with a
    suggestion to implement a sort of hysteresis to refuse taking pages into
    zswap pool until it has sufficient space if the limit has been hit.
    This is my take on this.

    Hysteresis is controlled with a sysfs-configurable parameter (namely,
    /sys/kernel/debug/zswap/accept_threhsold_percent). It specifies the
    threshold at which zswap would start accepting pages again after it
    became full. Setting this parameter to 100 disables the hysteresis and
    sets the zswap behavior to pre-hysteresis state.

    [1] https://lkml.org/lkml/2019/11/8/949

    Link: http://lkml.kernel.org/r/20200108200118.15563-1-vitaly.wool@konsulko.com
    Signed-off-by: Vitaly Wool
    Cc: Dan Streetman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     

25 Sep, 2019

2 commits

  • zswap_writeback_entry() maps a handle to read swpentry first, and
    then in the most common case it would map the same handle again.
    This is ok when zbud is the backend since its mapping callback is
    plain and simple, but it slows things down for z3fold.

    Since there's hardly a point in unmapping a handle _that_ fast as
    zswap_writeback_entry() does when it reads swpentry, the
    suggestion is to keep the handle mapped till the end.

    Link: http://lkml.kernel.org/r/20190916004640.b453167d3556c4093af4cf7d@gmail.com
    Signed-off-by: Vitaly Wool
    Reviewed-by: Dan Streetman
    Cc: Shakeel Butt
    Cc: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: Seth Jennings
    Cc: Vitaly Wool
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Wool
     
  • This is the third version that was updated according to the comments from
    Sergey Senozhatsky https://lkml.org/lkml/2019/5/29/73 and Shakeel Butt
    https://lkml.org/lkml/2019/6/4/973

    zswap compresses swap pages into a dynamically allocated RAM-based memory
    pool. The memory pool should be zbud, z3fold or zsmalloc. All of them
    will allocate unmovable pages. It will increase the number of unmovable
    page blocks that will bad for anti-fragment.

    zsmalloc support page migration if request movable page:
    handle = zs_malloc(zram->mem_pool, comp_len,
    GFP_NOIO | __GFP_HIGHMEM |
    __GFP_MOVABLE);

    And commit "zpool: Add malloc_support_movable to zpool_driver" add
    zpool_malloc_support_movable check malloc_support_movable to make sure if
    a zpool support allocate movable memory.

    This commit let zswap allocate block with gfp
    __GFP_HIGHMEM | __GFP_MOVABLE if zpool support allocate movable memory.

    Following part is test log in a pc that has 8G memory and 2G swap.

    Without this commit:
    ~# echo lz4 > /sys/module/zswap/parameters/compressor
    ~# echo zsmalloc > /sys/module/zswap/parameters/zpool
    ~# echo 1 > /sys/module/zswap/parameters/enabled
    ~# swapon /swapfile
    ~# cd /home/teawater/kernel/vm-scalability/
    /home/teawater/kernel/vm-scalability# export unit_size=$((9 * 1024 * 1024 * 1024))
    /home/teawater/kernel/vm-scalability# ./case-anon-w-seq
    2717908992 bytes / 4826062 usecs = 549973 KB/s
    2717908992 bytes / 4864201 usecs = 545661 KB/s
    2717908992 bytes / 4867015 usecs = 545346 KB/s
    2717908992 bytes / 4915485 usecs = 539968 KB/s
    397853 usecs to free memory
    357820 usecs to free memory
    421333 usecs to free memory
    420454 usecs to free memory
    /home/teawater/kernel/vm-scalability# cat /proc/pagetypeinfo
    Page block order: 9
    Pages per block: 512

    Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
    Node 0, zone DMA, type Unmovable 1 1 1 0 2 1 1 0 1 0 0
    Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3
    Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone DMA, type CMA 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone DMA32, type Unmovable 6 5 8 6 6 5 4 1 1 1 0
    Node 0, zone DMA32, type Movable 25 20 20 19 22 15 14 11 11 5 767
    Node 0, zone DMA32, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone DMA32, type CMA 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone Normal, type Unmovable 4753 5588 5159 4613 3712 2520 1448 594 188 11 0
    Node 0, zone Normal, type Movable 16 3 457 2648 2143 1435 860 459 223 224 296
    Node 0, zone Normal, type Reclaimable 0 0 44 38 11 2 0 0 0 0 0
    Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0

    Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate
    Node 0, zone DMA 1 7 0 0 0 0
    Node 0, zone DMA32 4 1652 0 0 0 0
    Node 0, zone Normal 931 1485 15 0 0 0

    With this commit:
    ~# echo lz4 > /sys/module/zswap/parameters/compressor
    ~# echo zsmalloc > /sys/module/zswap/parameters/zpool
    ~# echo 1 > /sys/module/zswap/parameters/enabled
    ~# swapon /swapfile
    ~# cd /home/teawater/kernel/vm-scalability/
    /home/teawater/kernel/vm-scalability# export unit_size=$((9 * 1024 * 1024 * 1024))
    /home/teawater/kernel/vm-scalability# ./case-anon-w-seq
    2717908992 bytes / 4689240 usecs = 566020 KB/s
    2717908992 bytes / 4760605 usecs = 557535 KB/s
    2717908992 bytes / 4803621 usecs = 552543 KB/s
    2717908992 bytes / 5069828 usecs = 523530 KB/s
    431546 usecs to free memory
    383397 usecs to free memory
    456454 usecs to free memory
    224487 usecs to free memory
    /home/teawater/kernel/vm-scalability# cat /proc/pagetypeinfo
    Page block order: 9
    Pages per block: 512

    Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
    Node 0, zone DMA, type Unmovable 1 1 1 0 2 1 1 0 1 0 0
    Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3
    Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone DMA, type CMA 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone DMA32, type Unmovable 10 8 10 9 10 4 3 2 3 0 0
    Node 0, zone DMA32, type Movable 18 12 14 16 16 11 9 5 5 6 775
    Node 0, zone DMA32, type Reclaimable 0 0 0 0 0 0 0 0 0 0 1
    Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone DMA32, type CMA 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone Normal, type Unmovable 2669 1236 452 118 37 14 4 1 2 3 0
    Node 0, zone Normal, type Movable 3850 6086 5274 4327 3510 2494 1520 934 438 220 470
    Node 0, zone Normal, type Reclaimable 56 93 155 124 47 31 17 7 3 0 0
    Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0
    Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0

    Number of blocks type Unmovable Movable Reclaimable HighAtomic CMA Isolate
    Node 0, zone DMA 1 7 0 0 0 0
    Node 0, zone DMA32 4 1650 2 0 0 0
    Node 0, zone Normal 79 2326 26 0 0 0

    You can see that the number of unmovable page blocks is decreased
    when the kernel has this commit.

    Link: http://lkml.kernel.org/r/20190605100630.13293-2-teawaterz@linux.alibaba.com
    Signed-off-by: Hui Zhu
    Reviewed-by: Shakeel Butt
    Cc: Dan Streetman
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Cc: Sergey Senozhatsky
    Cc: Seth Jennings
    Cc: Vitaly Wool
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hui Zhu
     

03 Jun, 2019

1 commit


31 May, 2019

1 commit

  • Based on 3 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version this program is distributed in the
    hope that it will be useful but without any warranty without even
    the implied warranty of merchantability or fitness for a particular
    purpose see the gnu general public license for more details

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version [author] [kishon] [vijay] [abraham]
    [i] [kishon]@[ti] [com] this program is distributed in the hope that
    it will be useful but without any warranty without even the implied
    warranty of merchantability or fitness for a particular purpose see
    the gnu general public license for more details

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation either version 2 of the license or at
    your option any later version [author] [graeme] [gregory]
    [gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i]
    [kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema]
    [hk] [hemahk]@[ti] [com] this program is distributed in the hope
    that it will be useful but without any warranty without even the
    implied warranty of merchantability or fitness for a particular
    purpose see the gnu general public license for more details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 1105 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Richard Fontana
    Reviewed-by: Kate Stewart
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190527070033.202006027@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

29 Dec, 2018

1 commit

  • totalram_pages and totalhigh_pages are made static inline function.

    Main motivation was that managed_page_count_lock handling was complicating
    things. It was discussed in length here,
    https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
    better to remove the lock and convert variables to atomic, with preventing
    poteintial store-to-read tearing as a bonus.

    [akpm@linux-foundation.org: coding style fixes]
    Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
    Signed-off-by: Arun KS
    Suggested-by: Michal Hocko
    Suggested-by: Vlastimil Babka
    Reviewed-by: Konstantin Khlebnikov
    Reviewed-by: Pavel Tatashin
    Acked-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: David Hildenbrand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun KS
     

27 Jul, 2018

1 commit

  • /sys/../zswap/stored_pages keeps rising in a zswap test with
    "zswap.max_pool_percent=0" parameter. But it should not compress or
    store pages any more since there is no space in the compressed pool.

    Reproduce steps:
    1. Boot kernel with "zswap.enabled=1"
    2. Set the max_pool_percent to 0
    # echo 0 > /sys/module/zswap/parameters/max_pool_percent
    3. Do memory stress test to see if some pages have been compressed
    # stress --vm 1 --vm-bytes $mem_available"M" --timeout 60s
    4. Watching the 'stored_pages' number increasing or not

    The root cause is:

    When zswap_max_pool_percent is set to 0 via kernel parameter,
    zswap_is_full() will always return true due to zswap_shrink(). But if
    the shinking is able to reclain a page successfully the code then
    proceeds to compressing/storing another page, so the value of
    stored_pages will keep changing.

    To solve the issue, this patch adds a zswap_is_full() check again after
    zswap_shrink() to make sure it's now under the max_pool_percent, and to
    not compress/store if we reached the limit.

    Link: http://lkml.kernel.org/r/20180530103936.17812-1-liwang@redhat.com
    Signed-off-by: Li Wang
    Acked-by: Dan Streetman
    Cc: Seth Jennings
    Cc: Huang Ying
    Cc: Yu Zhao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Wang
     

15 Jun, 2018

1 commit

  • mm/*.c files use symbolic and octal styles for permissions.

    Using octal and not symbolic permissions is preferred by many as more
    readable.

    https://lkml.org/lkml/2016/8/2/1945

    Prefer the direct use of octal for permissions.

    Done using
    $ scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace mm/*.c
    and some typing.

    Before: $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
    44
    After: $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
    86

    Miscellanea:

    o Whitespace neatening around these conversions.

    Link: http://lkml.kernel.org/r/2e032ef111eebcd4c5952bae86763b541d373469.1522102887.git.joe@perches.com
    Signed-off-by: Joe Perches
    Acked-by: David Rientjes
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

22 Feb, 2018

1 commit

  • It was reported by Sergey Senozhatsky that if THP (Transparent Huge
    Page) and frontswap (via zswap) are both enabled, when memory goes low
    so that swap is triggered, segfault and memory corruption will occur in
    random user space applications as follow,

    kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
    #0 0x00007fc08889ae0d _int_malloc (libc.so.6)
    #1 0x00007fc08889c2f3 malloc (libc.so.6)
    #2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
    #3 0x0000560e6005e75c n/a (urxvt)
    #4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
    #5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
    #6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
    #7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
    #8 0x0000560e6005cb55 ev_run (urxvt)
    #9 0x0000560e6003b9b9 main (urxvt)
    #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
    #11 0x0000560e6003f9da _start (urxvt)

    After bisection, it was found the first bad commit is bd4c82c22c36 ("mm,
    THP, swap: delay splitting THP after swapped out").

    The root cause is as follows:

    When the pages are written to swap device during swapping out in
    swap_writepage(), zswap (fontswap) is tried to compress the pages to
    improve performance. But zswap (frontswap) will treat THP as a normal
    page, so only the head page is saved. After swapping in, tail pages
    will not be restored to their original contents, causing memory
    corruption in the applications.

    This is fixed by refusing to save page in the frontswap store functions
    if the page is a THP. So that the THP will be swapped out to swap
    device.

    Another choice is to split THP if frontswap is enabled. But it is found
    that the frontswap enabling isn't flexible. For example, if
    CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if
    zswap itself isn't enabled.

    Frontswap has multiple backends, to make it easy for one backend to
    enable THP support, the THP checking is put in backend frontswap store
    functions instead of the general interfaces.

    Link: http://lkml.kernel.org/r/20180209084947.22749-1-ying.huang@intel.com
    Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped out")
    Signed-off-by: "Huang, Ying"
    Reported-by: Sergey Senozhatsky
    Tested-by: Sergey Senozhatsky
    Suggested-by: Minchan Kim [put THP checking in backend]
    Cc: Konrad Rzeszutek Wilk
    Cc: Dan Streetman
    Cc: Seth Jennings
    Cc: Tetsuo Handa
    Cc: Shaohua Li
    Cc: Michal Hocko
    Cc: Johannes Weiner
    Cc: Mel Gorman
    Cc: Shakeel Butt
    Cc: Boris Ostrovsky
    Cc: Juergen Gross
    Cc: [4.14]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Ying
     

01 Feb, 2018

2 commits

  • We waste sizeof(swp_entry_t) for zswap header when using zsmalloc as
    zpool driver because zsmalloc doesn't support eviction.

    Add zpool_evictable() to detect if zpool is potentially evictable, and
    use it in zswap to avoid waste memory for zswap header.

    [yuzhao@google.com: The zpool->" prefix is a result of copy & paste]
    Link: http://lkml.kernel.org/r/20180110225626.110330-1-yuzhao@google.com
    Link: http://lkml.kernel.org/r/20180110224741.83751-1-yuzhao@google.com
    Signed-off-by: Yu Zhao
    Acked-by: Dan Streetman
    Reviewed-by: Sergey Senozhatsky
    Cc: Seth Jennings
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yu Zhao
     
  • Zswap is a cache which compresses the pages that are being swapped out
    and stores them into a dynamically allocated RAM-based memory pool.
    Experiments have shown that around 10-20% of pages stored in zswap are
    same-filled pages (i.e. contents of the page are all same), but these
    pages are handled as normal pages by compressing and allocating memory
    in the pool.

    This patch adds a check in zswap_frontswap_store() to identify
    same-filled page before compression of the page. If the page is a
    same-filled page, set zswap_entry.length to zero, save the same-filled
    value and skip the compression of the page and alloction of memory in
    zpool. In zswap_frontswap_load(), check if value of zswap_entry.length
    is zero corresponding to the page to be loaded. If zswap_entry.length
    is zero, fill the page with same-filled value. This saves the
    decompression time during load.

    On a ARM Quad Core 32-bit device with 1.5GB RAM by launching and
    relaunching different applications, out of ~64000 pages stored in zswap,
    ~11000 pages were same-value filled pages (including zero-filled pages)
    and ~9000 pages were zero-filled pages.

    An average of 17% of pages(including zero-filled pages) in zswap are
    same-value filled pages and 14% pages are zero-filled pages. An average
    of 3% of pages are same-filled non-zero pages.

    The below table shows the execution time profiling with the patch.

    Baseline With patch % Improvement
    -----------------------------------------------------------------
    *Zswap Store Time 26.5ms 18ms 32%
    (of same value pages)
    *Zswap Load Time
    (of same value pages) 25.5ms 13ms 49%
    -----------------------------------------------------------------

    On Ubuntu PC with 2GB RAM, while executing kernel build and other test
    scripts and running multimedia applications, out of 360000 pages stored
    in zswap 78000(~22%) of pages were found to be same-value filled pages
    (including zero-filled pages) and 64000(~17%) are zero-filled pages. So
    an average of %5 of pages are same-filled non-zero pages.

    The below table shows the execution time profiling with the patch.

    Baseline With patch % Improvement
    -----------------------------------------------------------------
    *Zswap Store Time 91ms 74ms 19%
    (of same value pages)
    *Zswap Load Time 50ms 7.5ms 85%
    (of same value pages)
    -----------------------------------------------------------------

    *The execution times may vary with test device used.

    Dan said:

    : I did test this patch out this week, and I added some instrumentation to
    : check the performance impact, and tested with a small program to try to
    : check the best and worst cases.
    :
    : When doing a lot of swap where all (or almost all) pages are same-value, I
    : found this patch does save both time and space, significantly. The exact
    : improvement in time and space depends on which compressor is being used,
    : but roughly agrees with the numbers you listed.
    :
    : In the worst case situation, where all (or almost all) pages have the
    : same-value *except* the final long (meaning, zswap will check each long on
    : the entire page but then still have to pass the page to the compressor),
    : the same-value check is around 10-15% of the total time spent in
    : zswap_frontswap_store(). That's a not-insignificant amount of time, but
    : it's not huge. Considering that most systems will probably be swapping
    : pages that aren't similar to the worst case (although I don't have any
    : data to know that), I'd say the improvement is worth the possible
    : worst-case performance impact.

    [srividya.dr@samsung.com: add memset_l instead of for loop]
    Link: http://lkml.kernel.org/r/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1
    Signed-off-by: Srividya Desireddy
    Acked-by: Dan Streetman
    Cc: Seth Jennings
    Cc: Pekka Enberg
    Cc: Dinakar Reddy Pathireddy
    Cc: SHARAN ALLUR
    Cc: RAJIB BASU
    Cc: JUHUN KIM
    Cc: Matthew Wilcox
    Cc: Timofey Titovets
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Srividya Desireddy
     

07 Jul, 2017

3 commits

  • Omit an extra message for a memory allocation failure in this function.

    This issue was detected by using the Coccinelle software.

    Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
    Link: http://lkml.kernel.org/r/bae25b04-2ce2-7137-a71c-50d7b4f06431@users.sourceforge.net
    Signed-off-by: Markus Elfring
    Cc: Dan Streetman
    Cc: Seth Jennings
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Markus Elfring
     
  • Replace the specification of a data structure by a pointer dereference
    as the parameter for the operator "sizeof" to make the corresponding
    size determination a bit safer according to the Linux coding style
    convention.

    Link: http://lkml.kernel.org/r/19f9da22-092b-f867-bdf6-f4dbad7ccf1f@users.sourceforge.net
    Signed-off-by: Markus Elfring
    Cc: Dan Streetman
    Cc: Seth Jennings
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Markus Elfring
     
  • Omit an extra message for a memory allocation failure in this function.

    This issue was detected by using the Coccinelle software.

    Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
    Link: http://lkml.kernel.org/r/2345aabc-ae98-1d31-afba-40a02c5baf3d@users.sourceforge.net
    Signed-off-by: Markus Elfring
    Cc: Dan Streetman
    Cc: Seth Jennings
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Markus Elfring
     

28 Feb, 2017

3 commits

  • Change the zpool/compressor param callback function to release the
    zswap_pools_lock spinlock before calling param_set_charp, since that
    function may sleep when it calls kmalloc with GFP_KERNEL.

    While this problem has existed for a while, I wasn't able to trigger it
    using a tight loop changing either/both the zpool and compressor params; I
    think it's very unlikely to be an issue on the stable kernels, especially
    since most zswap users will change the compressor and/or zpool from sysfs
    only one time each boot - or zero times, if they add the params to the
    kernel boot.

    Fixes: c99b42c3529e ("zswap: use charp for zswap param strings")
    Link: http://lkml.kernel.org/r/20170126155821.4545-1-ddstreet@ieee.org
    Signed-off-by: Dan Streetman
    Reported-by: Sergey Senozhatsky
    Cc: Michal Hocko
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     
  • If either the compressor and/or zpool param are invalid at boot, and
    their default value is also invalid, set the param to the empty string
    to indicate there is no compressor and/or zpool configured. This allows
    users to check the sysfs interface to see which param needs changing.

    Link: http://lkml.kernel.org/r/20170124200259.16191-4-ddstreet@ieee.org
    Signed-off-by: Dan Streetman
    Cc: Seth Jennings
    Cc: Michal Hocko
    Cc: Sergey Senozhatsky
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     
  • Allow zswap to initialize at boot even if it can't create its pool due
    to a failure to create a zpool and/or compressor. Allow those to be
    created later, from the sysfs module param interface.

    Link: http://lkml.kernel.org/r/20170124200259.16191-3-ddstreet@ieee.org
    Signed-off-by: Dan Streetman
    Cc: Seth Jennings
    Cc: Michal Hocko
    Cc: Sergey Senozhatsky
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     

04 Feb, 2017

1 commit

  • Add zswap_init_failed bool that prevents changing any of the module
    params, if init_zswap() fails, and set zswap_enabled to false. Change
    'enabled' param to a callback, and check zswap_init_failed before
    allowing any change to 'enabled', 'zpool', or 'compressor' params.

    Any driver that is built-in to the kernel will not be unloaded if its
    init function returns error, and its module params remain accessible for
    users to change via sysfs. Since zswap uses param callbacks, which
    assume that zswap has been initialized, changing the zswap params after
    a failed initialization will result in WARNING due to the param
    callbacks expecting a pool to already exist. This prevents that by
    immediately exiting any of the param callbacks if initialization failed.

    This was reported here:
    https://marc.info/?l=linux-mm&m=147004228125528&w=4

    And fixes this WARNING:
    [ 429.723476] WARNING: CPU: 0 PID: 5140 at mm/zswap.c:503 __zswap_pool_current+0x56/0x60

    The warning is just noise, and not serious. However, when init fails,
    zswap frees all its percpu dstmem pages and its kmem cache. The kmem
    cache might be serious, if kmem_cache_alloc(NULL, gfp) has problems; but
    the percpu dstmem pages are definitely a problem, as they're used as
    temporary buffer for compressed pages before copying into place in the
    zpool.

    If the user does get zswap enabled after an init failure, then zswap
    will likely Oops on the first page it tries to compress (or worse, start
    corrupting memory).

    Fixes: 90b0fc26d5db ("zswap: change zpool/compressor at runtime")
    Link: http://lkml.kernel.org/r/20170124200259.16191-2-ddstreet@ieee.org
    Signed-off-by: Dan Streetman
    Reported-by: Marcin Miroslaw
    Cc: Seth Jennings
    Cc: Michal Hocko
    Cc: Sergey Senozhatsky
    Cc: Minchan Kim
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     

02 Dec, 2016

2 commits

  • Install the callbacks via the state machine. Multi state is used to address the
    per-pool notifier. Uppon adding of the intance the callback is invoked for all
    online CPUs so the manual init can go.

    Signed-off-by: Sebastian Andrzej Siewior
    Cc: linux-mm@kvack.org
    Cc: Seth Jennings
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20161126231350.10321-13-bigeasy@linutronix.de
    Signed-off-by: Thomas Gleixner

    Sebastian Andrzej Siewior
     
  • Install the callbacks via the state machine and let the core invoke
    the callbacks on the already online CPUs.

    Signed-off-by: Sebastian Andrzej Siewior
    Cc: linux-mm@kvack.org
    Cc: Seth Jennings
    Cc: rt@linutronix.de
    Link: http://lkml.kernel.org/r/20161126231350.10321-12-bigeasy@linutronix.de
    Signed-off-by: Thomas Gleixner

    Sebastian Andrzej Siewior
     

21 May, 2016

1 commit

  • Add a work_struct to struct zswap_pool, and change __zswap_pool_empty to
    use the workqueue instead of using call_rcu().

    When zswap destroys a pool no longer in use, it uses call_rcu() to
    perform the destruction/freeing. Since that executes in softirq
    context, it must not sleep. However, actually destroying the pool
    involves freeing the per-cpu compressors (which requires locking the
    cpu_add_remove_lock mutex) and freeing the zpool, for which the
    implementation may sleep (e.g. zsmalloc calls kmem_cache_destroy, which
    locks the slab_mutex). So if either mutex is currently taken, or any
    other part of the compressor or zpool implementation sleeps, it will
    result in a BUG().

    It's not easy to reproduce this when changing zswap's params normally.
    In testing with a loaded system, this does not fail:

    $ cd /sys/module/zswap/parameters
    $ echo lz4 > compressor ; echo zsmalloc > zpool

    nor does this:

    $ while true ; do
    > echo lzo > compressor ; echo zbud > zpool
    > sleep 1
    > echo lz4 > compressor ; echo zsmalloc > zpool
    > sleep 1
    > done

    although it's still possible either of those might fail, depending on
    whether anything else besides zswap has locked the mutexes.

    However, changing a parameter with no delay immediately causes the
    schedule while atomic BUG:

    $ while true ; do
    > echo lzo > compressor ; echo lz4 > compressor
    > done

    This is essentially the same as Yu Zhao's proposed patch to zsmalloc,
    but moved to zswap, to cover compressor and zpool freeing.

    Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
    Signed-off-by: Dan Streetman
    Reported-by: Yu Zhao
    Reviewed-by: Sergey Senozhatsky
    Cc: Minchan Kim
    Cc: Dan Streetman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     

06 May, 2016

1 commit

  • Instead of using "zswap" as the name for all zpools created, add an
    atomic counter and use "zswap%x" with the counter number for each zpool
    created, to provide a unique name for each new zpool.

    As zsmalloc, one of the zpool implementations, requires/expects a unique
    name for each pool created, zswap should provide a unique name. The
    zsmalloc pool creation does not fail if a new pool with a conflicting
    name is created, unless CONFIG_ZSMALLOC_STAT is enabled; in that case,
    zsmalloc pool creation fails with -ENOMEM. Then zswap will be unable to
    change its compressor parameter if its zpool is zsmalloc; it also will
    be unable to change its zpool parameter back to zsmalloc, if it has any
    existing old zpool using zsmalloc with page(s) in it. Attempts to
    change the parameters will result in failure to create the zpool. This
    changes zswap to provide a unique name for each zpool creation.

    Fixes: f1c54846ee45 ("zswap: dynamic pool creation")
    Signed-off-by: Dan Streetman
    Reported-by: Sergey Senozhatsky
    Reviewed-by: Sergey Senozhatsky
    Cc: Dan Streetman
    Cc: Minchan Kim
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     

05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

19 Dec, 2015

1 commit

  • Change the use of strncmp in zswap_pool_find_get() to strcmp.

    The use of strncmp is no longer correct, now that zswap_zpool_type is
    not an array; sizeof() will return the size of a pointer, which isn't
    the right length to compare. We don't need to use strncmp anyway,
    because the existing params and the passed in params are all guaranteed
    to be null terminated, so strcmp should be used.

    Signed-off-by: Dan Streetman
    Reported-by: Weijie Yang
    Cc: Seth Jennings
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     

07 Nov, 2015

3 commits

  • Instead of using a fixed-length string for the zswap params, use charp.
    This simplifies the code and uses less memory, as most zswap param strings
    will be less than the current maximum length.

    Signed-off-by: Dan Streetman
    Cc: Rusty Russell
    Cc: Seth Jennings
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     
  • On the next line entry variable will be re-initialized so no need to init
    it with NULL.

    Signed-off-by: Alexey Klimov
    Cc: Seth Jennings
    Cc: Dan Streetman
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Klimov
     
  • …d avoiding waking kswapd

    __GFP_WAIT has been used to identify atomic context in callers that hold
    spinlocks or are in interrupts. They are expected to be high priority and
    have access one of two watermarks lower than "min" which can be referred
    to as the "atomic reserve". __GFP_HIGH users get access to the first
    lower watermark and can be called the "high priority reserve".

    Over time, callers had a requirement to not block when fallback options
    were available. Some have abused __GFP_WAIT leading to a situation where
    an optimisitic allocation with a fallback option can access atomic
    reserves.

    This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
    cannot sleep and have no alternative. High priority users continue to use
    __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
    are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
    callers that want to wake kswapd for background reclaim. __GFP_WAIT is
    redefined as a caller that is willing to enter direct reclaim and wake
    kswapd for background reclaim.

    This patch then converts a number of sites

    o __GFP_ATOMIC is used by callers that are high priority and have memory
    pools for those requests. GFP_ATOMIC uses this flag.

    o Callers that have a limited mempool to guarantee forward progress clear
    __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
    into this category where kswapd will still be woken but atomic reserves
    are not used as there is a one-entry mempool to guarantee progress.

    o Callers that are checking if they are non-blocking should use the
    helper gfpflags_allow_blocking() where possible. This is because
    checking for __GFP_WAIT as was done historically now can trigger false
    positives. Some exceptions like dm-crypt.c exist where the code intent
    is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
    flag manipulations.

    o Callers that built their own GFP flags instead of starting with GFP_KERNEL
    and friends now also need to specify __GFP_KSWAPD_RECLAIM.

    The first key hazard to watch out for is callers that removed __GFP_WAIT
    and was depending on access to atomic reserves for inconspicuous reasons.
    In some cases it may be appropriate for them to use __GFP_HIGH.

    The second key hazard is callers that assembled their own combination of
    GFP flags instead of starting with something like GFP_KERNEL. They may
    now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
    if it's missed in most cases as other activity will wake kswapd.

    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Vitaly Wool <vitalywool@gmail.com>
    Cc: Rik van Riel <riel@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman
     

11 Sep, 2015

2 commits

  • Update the zpool and compressor parameters to be changeable at runtime.
    When changed, a new pool is created with the requested zpool/compressor,
    and added as the current pool at the front of the pool list. Previous
    pools remain in the list only to remove existing compressed pages from.
    The old pool(s) are removed once they become empty.

    Signed-off-by: Dan Streetman
    Acked-by: Seth Jennings
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     
  • Add dynamic creation of pools. Move the static crypto compression per-cpu
    transforms into each pool. Add a pointer to zswap_entry to the pool it's
    in.

    This is required by the following patch which enables changing the zswap
    zpool and compressor params at runtime.

    [akpm@linux-foundation.org: fix merge snafus]
    Signed-off-by: Dan Streetman
    Acked-by: Seth Jennings
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     

09 Sep, 2015

2 commits

  • The structure zpool_ops is not modified so make the pointer to it a
    pointer to const.

    Signed-off-by: Krzysztof Kozlowski
    Acked-by: Dan Streetman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Krzysztof Kozlowski
     
  • zswap_get_swap_cache_page and read_swap_cache_async have pretty much the
    same code with only significant difference in return value and usage of
    swap_readpage.

    I a helper __read_swap_cache_async() with the common code. Behavior
    change: now zswap_get_swap_cache_page will use radix_tree_maybe_preload
    instead radix_tree_preload. Looks like, this wasn't changed only by the
    reason of code duplication.

    Signed-off-by: Dmitry Safonov
    Cc: Johannes Weiner
    Cc: Vladimir Davydov
    Cc: Michal Hocko
    Cc: Hugh Dickins
    Cc: Minchan Kim
    Cc: Tejun Heo
    Cc: Jens Axboe
    Cc: Christoph Hellwig
    Cc: David Herrmann
    Cc: Seth Jennings
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Safonov
     

26 Jun, 2015

1 commit

  • Change the "enabled" parameter to be configurable at runtime. Remove the
    enabled check from init(), and move it to the frontswap store() function;
    when enabled, pages will be stored, and when disabled, pages won't be
    stored.

    This is almost identical to Seth's patch from 2 years ago:
    http://lkml.iu.edu/hypermail/linux/kernel/1307.2/04289.html

    [akpm@linux-foundation.org: tweak documentation]
    Signed-off-by: Dan Streetman
    Suggested-by: Seth Jennings
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     

13 Feb, 2015

1 commit

  • Currently the underlay of zpool: zsmalloc/zbud, do not know who creates
    them. There is not a method to let zsmalloc/zbud find which caller they
    belong to.

    Now we want to add statistics collection in zsmalloc. We need to name the
    debugfs dir for each pool created. The way suggested by Minchan Kim is to
    use a name passed by caller(such as zram) to create the zsmalloc pool.

    /sys/kernel/debug/zsmalloc/zram0

    This patch adds an argument `name' to zs_create_pool() and other related
    functions.

    Signed-off-by: Ganesh Mahendran
    Acked-by: Minchan Kim
    Cc: Seth Jennings
    Cc: Nitin Gupta
    Cc: Dan Streetman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     

14 Dec, 2014

2 commits


20 Nov, 2014

1 commit


13 Nov, 2014

1 commit


09 Aug, 2014

1 commit