19 Dec, 2014

1 commit

  • Currently functions in zsmalloc.c does not arranged in a readable and
    reasonable sequence. With the more and more functions added, we may
    meet below inconvenience. For example:

    Current functions:

    void zs_init()
    {
    }

    static void get_maxobj_per_zspage()
    {
    }

    Then I want to add a func_1() which is called from zs_init(), and this
    new added function func_1() will used get_maxobj_per_zspage() which is
    defined below zs_init().

    void func_1()
    {
    get_maxobj_per_zspage()
    }

    void zs_init()
    {
    func_1()
    }

    static void get_maxobj_per_zspage()
    {
    }

    This will cause compiling issue. So we must add a declaration:

    static void get_maxobj_per_zspage();

    before func_1() if we do not put get_maxobj_per_zspage() before
    func_1().

    In addition, puting module_[init|exit] functions at the bottom of the
    file conforms to our habit.

    So, this patch ajusts function sequence as:

    /* helper functions */
    ...
    obj_location_to_handle()
    ...

    /* Some exported functions */
    ...

    zs_map_object()
    zs_unmap_object()

    zs_malloc()
    zs_free()

    zs_init()
    zs_exit()

    Signed-off-by: Ganesh Mahendran
    Cc: Nitin Gupta
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     

14 Dec, 2014

6 commits

  • In zs_create_pool(), we allocate memory more then sizeof(struct zs_pool)
    ovhd_size = roundup(sizeof(*pool), PAGE_SIZE);

    This patch allocate memory of exactly needed size.

    Signed-off-by: Ganesh Mahendran
    Acked-by: Minchan Kim
    Cc: Nitin Gupta
    Cc: Dan Streetman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • In zs_create_pool(), prev_class is assigned (ZS_SIZE_CLASSES - 1) times.
    And the prev_class only references to the previous size_class. So we do
    not need unnecessary assignement.

    This patch assigns *prev_class* when a new size_class structure is
    allocated and uses prev_class to check whether the first class has been
    allocated.

    [akpm@linux-foundation.org: remove now-unused ZS_SIZE_CLASSES]
    Signed-off-by: Ganesh Mahendran
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Reviewed-by: Dan Streetman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ganesh Mahendran
     
  • I sent a patch [1] for unnecessary check in zsmalloc. And Minchan Kim
    found zsmalloc even does not support allocating an obj with the size of
    ZS_MAX_ALLOC_SIZE in some situations.

    For example:
    In system with 64KB PAGE_SIZE and 32 bit of physical addr. Then:
    ZS_MIN_ALLOC_SIZE is 32 bytes which is calculated by:
    MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
    ZS_MAX_ALLOC_SIZE is 64KB(in current code, is PAGE_SIZE)
    ZS_SIZE_CLASS_DELTA is 256 bytes
    So, ZS_SIZE_CLASSES = (ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZE) /
    ZS_SIZE_CLASS_DELTA + 1
    = 256

    In zs_create_pool(), the max size obj which can be allocated will be:
    ZS_MIN_ALLOC_SIZE + i * ZS_SIZE_CLASS_DELTA = 32 + 255*256 = 65312

    We can see that 65312 < 65536 (ZS_MAX_ALLOC_SIZE). So we can NOT
    allocate objs with size ZS_MAX_ALLOC_SIZE(65536) which we promise upper
    users we can do.

    [1] http://lkml.iu.edu/hypermail/linux/kernel/1411.2/03835.html
    [2] http://lkml.iu.edu/hypermail/linux/kernel/1411.2/04534.html

    This patch fixes this issue by dynamiclly calculating zs_size_classes when
    module is loaded, allocates buffer with size ZS_MAX_ALLOC_SIZE. Then the
    max obj(size is ZS_MAX_ALLOC_SIZE) can be stored in it.

    [akpm@linux-foundation.org: restore ZS_SIZE_CLASSES to fix bisectability]
    Signed-off-by: Mahendran Ganesh
    Suggested-by: Minchan Kim
    Cc: Nitin Gupta
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mahendran Ganesh
     
  • The kunmap_atomic should use virtual address getting by kmap_atomic.
    However, some pieces of code in zsmalloc uses modified address, not the
    one got by kmap_atomic for kunmap_atomic.

    It's okay for working because zsmalloc modifies the address inner
    PAGE_SIZE bounday so it works with current kmap_atomic's implementation.
    But it's still fragile with potential changing of kmap_atomic so let's
    correct it.

    I got a subtle bug when I implemented a new feature of zsmalloc
    (compaction) due to a link's mishandling (the link was over page
    boundary). Although it was totally my mistake, it took a while to find
    the cause because an unpredictable kmapped address was unmapped causing an
    almost random crash.

    Signed-off-by: Minchan Kim
    Cc: Nitin Gupta
    Cc: Sergey Senozhatsky
    Cc: Dan Streetman
    Cc: Seth Jennings
    Cc: Jerome Marchand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Mahendran Ganesh reported that zpool-enabled zsmalloc should not call
    zpool_unregister_driver() from zs_init() if cpu notifier registration has
    failed, because error handling is performed before we register the driver
    via zpool_register_driver() call.

    Factor out cpu notifier registration and unregistration code and fix
    zs_init() error handling.

    link: http://lkml.iu.edu//hypermail/linux/kernel/1411.1/04156.html
    [akpm@linux-foundation.org: squash bogus gcc warning]
    [akpm@linux-foundation.org: use __init and __exit]
    Signed-off-by: Sergey Senozhatsky
    Reported-by: Mahendran Ganesh
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sergey Senozhatsky
     
  • zsmalloc has many size_classes to reduce fragmentation and they are in 16
    bytes unit, for example, 16, 32, 48, etc., if PAGE_SIZE is 4096. And,
    zsmalloc has constraint that each zspage has 4 pages at maximum.

    In this situation, we can see interesting aspect. Let's think about
    size_class for 1488, 1472, ..., 1376. To prevent external fragmentation,
    they uses 4 pages per zspage and so all they can contain 11 objects at
    maximum.

    16384 (4096 * 4) = 1488 * 11 + remains
    16384 (4096 * 4) = 1472 * 11 + remains
    16384 (4096 * 4) = ...
    16384 (4096 * 4) = 1376 * 11 + remains

    It means that they have same characteristics and classification between
    them isn't needed. If we use one size_class for them, we can reduce
    fragementation and save some memory since both the 1488 and 1472 sized
    classes can only fit 11 objects into 4 pages, and an object that's 1472
    bytes can fit into an object that's 1488 bytes, merging these classes to
    always use objects that are 1488 bytes will reduce the total number of
    size classes. And reducing the total number of size classes reduces
    overall fragmentation, because a wider range of compressed pages can fit
    into a single size class, leaving less unused objects in each size class.

    For this purpose, this patch implement size_class merging. If there is
    size_class that have same pages_per_zspage and same number of objects per
    zspage with previous size_class, we don't create new size_class. Instead,
    we use previous, same characteristic size_class. With this way, above
    example sizes (1488, 1472, ..., 1376) use just one size_class so we can
    get much more memory utilization.

    Below is result of my simple test.

    TEST ENV: EXT4 on zram, mount with discard option WORKLOAD: untar kernel
    source code, remove directory in descending order in size. (drivers arch
    fs sound include net Documentation firmware kernel tools)

    Each line represents orig_data_size, compr_data_size, mem_used_total,
    fragmentation overhead (mem_used - compr_data_size) and overhead ratio
    (overhead to compr_data_size), respectively, after untar and remove
    operation is executed.

    * untar-nomerge.out

    orig_size compr_size used_size overhead overhead_ratio
    525.88MB 199.16MB 210.23MB 11.08MB 5.56%
    288.32MB 97.43MB 105.63MB 8.20MB 8.41%
    177.32MB 61.12MB 69.40MB 8.28MB 13.55%
    146.47MB 47.32MB 56.10MB 8.78MB 18.55%
    124.16MB 38.85MB 48.41MB 9.55MB 24.58%
    103.93MB 31.68MB 40.93MB 9.25MB 29.21%
    84.34MB 22.86MB 32.72MB 9.86MB 43.13%
    66.87MB 14.83MB 23.83MB 9.00MB 60.70%
    60.67MB 11.11MB 18.60MB 7.49MB 67.48%
    55.86MB 8.83MB 16.61MB 7.77MB 88.03%
    53.32MB 8.01MB 15.32MB 7.31MB 91.24%

    * untar-merge.out

    orig_size compr_size used_size overhead overhead_ratio
    526.23MB 199.18MB 209.81MB 10.64MB 5.34%
    288.68MB 97.45MB 104.08MB 6.63MB 6.80%
    177.68MB 61.14MB 66.93MB 5.79MB 9.47%
    146.83MB 47.34MB 52.79MB 5.45MB 11.51%
    124.52MB 38.87MB 44.30MB 5.43MB 13.96%
    104.29MB 31.70MB 36.83MB 5.13MB 16.19%
    84.70MB 22.88MB 27.92MB 5.04MB 22.04%
    67.11MB 14.83MB 19.26MB 4.43MB 29.86%
    60.82MB 11.10MB 14.90MB 3.79MB 34.17%
    55.90MB 8.82MB 12.61MB 3.79MB 42.97%
    53.32MB 8.01MB 11.73MB 3.73MB 46.53%

    As you can see above result, merged one has better utilization (overhead
    ratio, 5th column) and uses less memory (mem_used_total, 3rd column).

    Signed-off-by: Joonsoo Kim
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Sergey Senozhatsky
    Reviewed-by: Dan Streetman
    Cc: Luigi Semenzato
    Cc:
    Cc: "seungho1.park"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

10 Oct, 2014

4 commits

  • Change zsmalloc init_zspage() logic to iterate through each object on each
    of its pages, checking the offset to verify the object is on the current
    page before linking it into the zspage.

    The current zsmalloc init_zspage free object linking code has logic that
    relies on there only being one page per zspage when PAGE_SIZE is a
    multiple of class->size. It calculates the number of objects for the
    current page, and iterates through all of them plus one, to account for
    the assumed partial object at the end of the page. While this currently
    works, the logic can be simplified to just link the object at each
    successive offset until the offset is larger than PAGE_SIZE, which does
    not rely on PAGE_SIZE being a multiple of class->size.

    Signed-off-by: Dan Streetman
    Acked-by: Minchan Kim
    Cc: Sergey Senozhatsky
    Cc: Nitin Gupta
    Cc: Seth Jennings
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     
  • The letter 'f' in "n
    Acked-by: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wang Sheng-Hui
     
  • zs_get_total_size_bytes returns a amount of memory zsmalloc consumed with
    *byte unit* but zsmalloc operates *page unit* rather than byte unit so
    let's change the API so benefit we could get is that reduce unnecessary
    overhead (ie, change page unit with byte unit) in zsmalloc.

    Since return type is pages, "zs_get_total_pages" is better than
    "zs_get_total_size_bytes".

    Signed-off-by: Minchan Kim
    Reviewed-by: Dan Streetman
    Cc: Sergey Senozhatsky
    Cc: Jerome Marchand
    Cc:
    Cc:
    Cc: Luigi Semenzato
    Cc: Nitin Gupta
    Cc: Seth Jennings
    Cc: David Horner
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Currently, zram has no feature to limit memory so theoretically zram can
    deplete system memory. Users have asked for a limit several times as even
    without exhaustion zram makes it hard to control memory usage of the
    platform. This patchset adds the feature.

    Patch 1 makes zs_get_total_size_bytes faster because it would be used
    frequently in later patches for the new feature.

    Patch 2 changes zs_get_total_size_bytes's return unit from bytes to page
    so that zsmalloc doesn't need unnecessary operation(ie, << PAGE_SHIFT).

    Patch 3 adds new feature. I added the feature into zram layer, not
    zsmalloc because limiation is zram's requirement, not zsmalloc so any
    other user using zsmalloc(ie, zpool) shouldn't affected by unnecessary
    branch of zsmalloc. In future, if every users of zsmalloc want the
    feature, then, we could move the feature from client side to zsmalloc
    easily but vice versa would be painful.

    Patch 4 adds news facility to report maximum memory usage of zram so that
    this avoids user polling frequently via /sys/block/zram0/ mem_used_total
    and ensures transient max are not missed.

    This patch (of 4):

    pages_allocated has counted in size_class structure and when user of
    zsmalloc want to see total_size_bytes, it should gather all of count from
    each size_class to report the sum.

    It's not bad if user don't see the value often but if user start to see
    the value frequently, it would be not a good deal for performance pov.

    This patch moves the count from size_class to zs_pool so it could reduce
    memory footprint (from [255 * 8byte] to [sizeof(atomic_long_t)]).

    Signed-off-by: Minchan Kim
    Reviewed-by: Dan Streetman
    Cc: Sergey Senozhatsky
    Cc: Jerome Marchand
    Cc:
    Cc:
    Cc: Luigi Semenzato
    Cc: Nitin Gupta
    Cc: Seth Jennings
    Reviewed-by: David Horner
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

30 Aug, 2014

1 commit

  • To avoid potential format string expansion via module parameters, do not
    use the zpool type directly in request_module() without a format string.
    Additionally, to avoid arbitrary modules being loaded via zpool API
    (e.g. via the zswap_zpool_type module parameter) add a "zpool-" prefix
    to the requested module, as well as module aliases for the existing
    zpool types (zbud and zsmalloc).

    Signed-off-by: Kees Cook
    Cc: Seth Jennings
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Acked-by: Dan Streetman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

07 Aug, 2014

3 commits

  • Update zbud and zsmalloc to implement the zpool api.

    [fengguang.wu@intel.com: make functions static]
    Signed-off-by: Dan Streetman
    Tested-by: Seth Jennings
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Cc: Weijie Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     
  • Add zpool api.

    zpool provides an interface for memory storage, typically of compressed
    memory. Users can select what backend to use; currently the only
    implementations are zbud, a low density implementation with up to two
    compressed pages per storage page, and zsmalloc, a higher density
    implementation with multiple compressed pages per storage page.

    Signed-off-by: Dan Streetman
    Tested-by: Seth Jennings
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Cc: Weijie Yang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Streetman
     
  • Currently map_vm_area() takes (struct page *** pages) as third argument,
    and after mapping, it moves (*pages) to point to (*pages +
    nr_mappped_pages).

    It looks like this kind of increment is useless to its caller these
    days. The callers don't care about the increments and actually they're
    trying to avoid this by passing another copy to map_vm_area().

    The caller can always guarantee all the pages can be mapped into vm_area
    as specified in first argument and the caller only cares about whether
    map_vm_area() fails or not.

    This patch cleans up the pointer movement in map_vm_area() and updates
    its callers accordingly.

    Signed-off-by: WANG Chao
    Cc: Zhang Yanfei
    Acked-by: Greg Kroah-Hartman
    Cc: Minchan Kim
    Cc: Nitin Gupta
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    WANG Chao
     

05 Jun, 2014

2 commits


20 Mar, 2014

1 commit

  • Subsystems that want to register CPU hotplug callbacks, as well as perform
    initialization for the CPUs that are already online, often do it as shown
    below:

    get_online_cpus();

    for_each_online_cpu(cpu)
    init_cpu(cpu);

    register_cpu_notifier(&foobar_cpu_notifier);

    put_online_cpus();

    This is wrong, since it is prone to ABBA deadlocks involving the
    cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
    with CPU hotplug operations).

    Instead, the correct and race-free way of performing the callback
    registration is:

    cpu_notifier_register_begin();

    for_each_online_cpu(cpu)
    init_cpu(cpu);

    /* Note the use of the double underscored version of the API */
    __register_cpu_notifier(&foobar_cpu_notifier);

    cpu_notifier_register_done();

    Fix the zsmalloc code by using this latter form of callback registration.

    Cc: Nitin Gupta
    Cc: Ingo Molnar
    Signed-off-by: Srivatsa S. Bhat
    Acked-by: Minchan Kim
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     

31 Jan, 2014

2 commits

  • Add my copyright to the zsmalloc source code which I maintain.

    Signed-off-by: Minchan Kim
    Cc: Nitin Gupta
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • This patch moves zsmalloc under mm directory.

    Before that, description will explain why we have needed custom
    allocator.

    Zsmalloc is a new slab-based memory allocator for storing compressed
    pages. It is designed for low fragmentation and high allocation success
    rate on large object, but
    Acked-by: Nitin Gupta
    Reviewed-by: Konrad Rzeszutek Wilk
    Cc: Bob Liu
    Cc: Greg Kroah-Hartman
    Cc: Hugh Dickins
    Cc: Jens Axboe
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Pekka Enberg
    Cc: Rik van Riel
    Cc: Seth Jennings
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim