Eric Lee / smarc-fsl-linux-kernel

11 Jun, 2015

1 commit

02f7b4145 zsmalloc: fix a null pointer dereference in destroy_handle_cache() ... Browse Code »

If zs_create_pool()->create_handle_cache()->kmem_cache_create() or
pool->name allocation fails, zs_create_pool()->destroy_handle_cache()
will dereference the NULL pool->handle_cachep.

Modify destroy_handle_cache() to avoid this.

Signed-off-by: Sergey Senozhatsky
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2015-06-11 07:43:43 +0800

16 Apr, 2015

13 commits

160a117f0 zsmalloc: remove extra cond_resched() in __zs_compact ... Browse Code »

Do not perform cond_resched() before the busy compaction loop in
__zs_compact(), because this loop does it when needed.

Signed-off-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Cc: Nitin Gupta
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2015-04-16 07:35:22 +0800
81da9b13f zsmalloc: fix fatal corruption due to wrong size class selection ... Browse Code »

There is no point in overriding the size class below. It causes fatal
corruption on the next chunk on the 3264-bytes size class, which is the
last size class that is not huge.

For example, if the requested size was exactly 3264 bytes, current
zsmalloc allocates and returns a chunk from the size class of 3264 bytes,
not 4096. User access to this chunk may overwrite head of the next
adjacent chunk.

Here is the panic log captured when freelist was corrupted due to this:

Kernel BUG at ffffffc00030659c [verbose debug info unavailable]
Internal error: Oops - BUG: 96000006 [#1] PREEMPT SMP
Modules linked in:
exynos-snapshot: core register saved(CPU:5)
CPUMERRSR: 0000000000000000, L2MERRSR: 0000000000000000
exynos-snapshot: context saved(CPU:5)
exynos-snapshot: item - log_kevents is disabled
CPU: 5 PID: 898 Comm: kswapd0 Not tainted 3.10.61-4497415-eng #1
task: ffffffc0b8783d80 ti: ffffffc0b71e8000 task.ti: ffffffc0b71e8000
PC is at obj_idx_to_offset+0x0/0x1c
LR is at obj_malloc+0x44/0xe8
pc : [] lr : [] pstate: a0000045
sp : ffffffc0b71eb790
x29: ffffffc0b71eb790 x28: ffffffc00204c000
x27: 000000000001d96f x26: 0000000000000000
x25: ffffffc098cc3500 x24: ffffffc0a13f2810
x23: ffffffc098cc3501 x22: ffffffc0a13f2800
x21: 000011e1a02006e3 x20: ffffffc0a13f2800
x19: ffffffbc02a7e000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000feb
x15: 0000000000000000 x14: 00000000a01003e3
x13: 0000000000000020 x12: fffffffffffffff0
x11: ffffffc08b264000 x10: 00000000e3a01004
x9 : ffffffc08b263fea x8 : ffffffc0b1e611c0
x7 : ffffffc000307d24 x6 : 0000000000000000
x5 : 0000000000000038 x4 : 000000000000011e
x3 : ffffffbc00003e90 x2 : 0000000000000cc0
x1 : 00000000d0100371 x0 : ffffffbc00003e90

Reported-by: Sooyong Suk
Signed-off-by: Heesub Shin
Tested-by: Sooyong Suk
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heesub Shin
2015-04-16 07:35:22 +0800
839373e64 zsmalloc: remove unnecessary insertion/removal of zspage in compaction ... Browse Code »

In putback_zspage, we don't need to insert a zspage into list of zspage
in size_class again to just fix fullness group. We could do directly
without reinsertion so we could save some instuctions.

Reported-by: Heesub Shin
Signed-off-by: Minchan Kim
Cc: Nitin Gupta
Cc: Sergey Senozhatsky
Cc: Dan Streetman
Cc: Seth Jennings
Cc: Ganesh Mahendran
Cc: Luigi Semenzato
Cc: Gunho Lee
Cc: Juneho Choi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2015-04-16 07:35:22 +0800
495819ead zsmalloc: micro-optimize zs_object_copy() ... Browse Code »

A micro-optimization. Avoid additional branching and reduce (a bit)
registry pressure (f.e. s_off += size; d_off += size; may be calculated
twise: first for >= PAGE_SIZE check and later for offset update in "else"
clause).

scripts/bloat-o-meter shows some improvement

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-10 (-10)
function old new delta
zs_object_copy 550 540 -10

Signed-off-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Cc: Nitin Gupta
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2015-04-16 07:35:22 +0800
1ec7cfb13 zsmalloc: remove synchronize_rcu from zs_compact() ... Browse Code »

Do not synchronize rcu in zs_compact(). Neither zsmalloc not
zram use rcu.

Signed-off-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Cc: Nitin Gupta
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2015-04-16 07:35:21 +0800
888fa374e mm/zsmalloc.c: fix comment for get_pages_per_zspage ... Browse Code »

Signed-off-by: Yinghao Xie
Suggested-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yinghao Xie
2015-04-16 07:35:21 +0800
d02be50db zsmalloc: zsmalloc documentation ... Browse Code »

Create zsmalloc doc which explains design concept and stat information.

Signed-off-by: Minchan Kim
Cc: Juneho Choi
Cc: Gunho Lee
Cc: Luigi Semenzato
Cc: Dan Streetman
Cc: Seth Jennings
Cc: Nitin Gupta
Cc: Jerome Marchand
Cc: Sergey Senozhatsky
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2015-04-16 07:35:21 +0800
248ca1b05 zsmalloc: add fullness into stat ... Browse Code »

During investigating compaction, fullness information of each class is
helpful for investigating how the compaction works well. With that, we
could know how compaction works well more clear on each size class.

Signed-off-by: Minchan Kim
Cc: Juneho Choi
Cc: Gunho Lee
Cc: Luigi Semenzato
Cc: Dan Streetman
Cc: Seth Jennings
Cc: Nitin Gupta
Cc: Jerome Marchand
Cc: Sergey Senozhatsky
Cc: Joonsoo Kim
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2015-04-16 07:35:21 +0800
7b60a6852 zsmalloc: record handle in page->private for huge object ... Browse Code »

We store handle on header of each allocated object so it increases the
size of each object by sizeof(unsigned long).

If zram stores 4096 bytes to zsmalloc(ie, bad compression), zsmalloc needs
4104B-class to add handle.

However, 4104B-class has 1-pages_per_zspage so wasted size by internal
fragment is 8192 - 4104, which is terrible.

So this patch records the handle in page->private on such huge object(ie,
pages_per_zspage == 1 && maxobj_per_zspage == 1) instead of header of each
object so we could use 4096B-class, not 4104B-class.

Signed-off-by: Minchan Kim
Cc: Juneho Choi
Cc: Gunho Lee
Cc: Luigi Semenzato
Cc: Dan Streetman
Cc: Seth Jennings
Cc: Nitin Gupta
Cc: Jerome Marchand
Cc: Sergey Senozhatsky
Cc: Joonsoo Kim
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2015-04-16 07:35:21 +0800
d3d07c92f zsmalloc: adjust ZS_ALMOST_FULL ... Browse Code »

Curretly, zsmalloc regards a zspage as ZS_ALMOST_EMPTY if the zspage has
under 1/4 used objects(ie, fullness_threshold_frac). It could make result
in loose packing since zsmalloc migrates only ZS_ALMOST_EMPTY zspage out.

This patch changes the rule so that zsmalloc makes zspage which has above
3/4 used object ZS_ALMOST_FULL so it could make tight packing.

Signed-off-by: Minchan Kim
Cc: Juneho Choi
Cc: Gunho Lee
Cc: Luigi Semenzato
Cc: Dan Streetman
Cc: Seth Jennings
Cc: Nitin Gupta
Cc: Jerome Marchand
Cc: Sergey Senozhatsky
Cc: Joonsoo Kim
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2015-04-16 07:35:20 +0800
312fcae22 zsmalloc: support compaction ... Browse Code »

This patch provides core functions for migration of zsmalloc. Migraion
policy is simple as follows.

for each size class {
while {
src_page = get zs_page from ZS_ALMOST_EMPTY
if (!src_page)
break;
dst_page = get zs_page from ZS_ALMOST_FULL
if (!dst_page)
dst_page = get zs_page from ZS_ALMOST_EMPTY
if (!dst_page)
break;
migrate(from src_page, to dst_page);
}
}

For migration, we need to identify which objects in zspage are allocated
to migrate them out. We could know it by iterating of freed objects in a
zspage because first_page of zspage keeps free objects singly-linked list
but it's not efficient. Instead, this patch adds a tag(ie,
OBJ_ALLOCATED_TAG) in header of each object(ie, handle) so we could check
whether the object is allocated easily.

This patch adds another status bit in handle to synchronize between user
access through zs_map_object and migration. During migration, we cannot
move objects user are using due to data coherency between old object and
new object.

[akpm@linux-foundation.org: zsmalloc.c needs sched.h for cond_resched()]
Signed-off-by: Minchan Kim
Cc: Juneho Choi
Cc: Gunho Lee
Cc: Luigi Semenzato
Cc: Dan Streetman
Cc: Seth Jennings
Cc: Nitin Gupta
Cc: Jerome Marchand
Cc: Sergey Senozhatsky
Cc: Joonsoo Kim
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2015-04-16 07:35:20 +0800
c78062612 zsmalloc: factor out obj_[malloc|free] ... Browse Code »

In later patch, migration needs some part of functions in zs_malloc and
zs_free so this patch factor out them.

Signed-off-by: Minchan Kim
Cc: Juneho Choi
Cc: Gunho Lee
Cc: Luigi Semenzato
Cc: Dan Streetman
Cc: Seth Jennings
Cc: Nitin Gupta
Cc: Jerome Marchand
Cc: Sergey Senozhatsky
Cc: Joonsoo Kim
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2015-04-16 07:35:20 +0800
2e40e163a zsmalloc: decouple handle and object ... Browse Code »

Recently, we started to use zram heavily and some of issues
popped.

1) external fragmentation

I got a report from Juneho Choi that fork failed although there are plenty
of free pages in the system. His investigation revealed zram is one of
the culprit to make heavy fragmentation so there was no more contiguous
16K page for pgd to fork in the ARM.

2) non-movable pages

Other problem of zram now is that inherently, user want to use zram as
swap in small memory system so they use zRAM with CMA to use memory
efficiently. However, unfortunately, it doesn't work well because zRAM
cannot use CMA's movable pages unless it doesn't support compaction. I
got several reports about that OOM happened with zram although there are
lots of swap space and free space in CMA area.

3) internal fragmentation

zRAM has started support memory limitation feature to limit memory usage
and I sent a patchset(https://lkml.org/lkml/2014/9/21/148) for VM to be
harmonized with zram-swap to stop anonymous page reclaim if zram consumed
memory up to the limit although there are free space on the swap. One
problem for that direction is zram has no way to know any hole in memory
space zsmalloc allocated by internal fragmentation so zram would regard
swap is full although there are free space in zsmalloc. For solving the
issue, zram want to trigger compaction of zsmalloc before it decides full
or not.

This patchset is first step to support above issues. For that, it adds
indirect layer between handle and object location and supports manual
compaction to solve 3th problem first of all.

After this patchset got merged, next step is to make VM aware of zsmalloc
compaction so that generic compaction will move zsmalloced-pages
automatically in runtime.

In my imaginary experiment(ie, high compress ratio data with heavy swap
in/out on 8G zram-swap), data is as follows,

Before =
zram allocated object : 60212066 bytes
zram total used: 140103680 bytes
ratio: 42.98 percent
MemFree: 840192 kB

Compaction

After =
frag ratio after compaction
zram allocated object : 60212066 bytes
zram total used: 76185600 bytes
ratio: 79.03 percent
MemFree: 901932 kB

Juneho reported below in his real platform with small aging.
So, I think the benefit would be bigger in real aging system
for a long time.

- frag_ratio increased 3% (ie, higher is better)
- memfree increased about 6MB
- In buddy info, Normal 2^3: 4, 2^2: 1: 2^1 increased, Highmem: 2^1 21 increased

frag ratio after swap fragment
used : 156677 kbytes
total: 166092 kbytes
frag_ratio : 94
meminfo before compaction
MemFree: 83724 kB
Node 0, zone Normal 13642 1364 57 10 61 17 9 5 4 0 0
Node 0, zone HighMem 425 29 1 0 0 0 0 0 0 0 0

num_migrated : 23630
compaction done

frag ratio after compaction
used : 156673 kbytes
total: 160564 kbytes
frag_ratio : 97
meminfo after compaction
MemFree: 89060 kB
Node 0, zone Normal 14076 1544 67 14 61 17 9 5 4 0 0
Node 0, zone HighMem 863 50 1 0 0 0 0 0 0 0 0

This patchset adds more logics(about 480 lines) in zsmalloc but when I
tested heavy swapin/out program, the regression for swapin/out speed is
marginal because most of overheads were caused by compress/decompress and
other MM reclaim stuff.

This patch (of 7):

Currently, handle of zsmalloc encodes object's location directly so it
makes support of migration hard.

This patch decouples handle and object via adding indirect layer. For
that, it allocates handle dynamically and returns it to user. The handle
is the address allocated by slab allocation so it's unique and we could
keep object's location in the memory space allocated for handle.

With it, we can change object's position without changing handle itself.

Signed-off-by: Minchan Kim
Cc: Juneho Choi
Cc: Gunho Lee
Cc: Luigi Semenzato
Cc: Dan Streetman
Cc: Seth Jennings
Cc: Nitin Gupta
Cc: Jerome Marchand
Cc: Sergey Senozhatsky
Cc: Joonsoo Kim
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2015-04-16 07:35:20 +0800

13 Feb, 2015

2 commits

0f050d997 mm/zsmalloc: add statistics support ... Browse Code »

Keeping fragmentation of zsmalloc in a low level is our target. But now
we still need to add the debug code in zsmalloc to get the quantitative
data.

This patch adds a new configuration CONFIG_ZSMALLOC_STAT to enable the
statistics collection for developers. Currently only the objects
statatitics in each class are collected. User can get the information via
debugfs.

cat /sys/kernel/debug/zsmalloc/zram0/...

For example:

After I copied "jdk-8u25-linux-x64.tar.gz" to zram with ext4 filesystem:
class size obj_allocated obj_used pages_used
0 32 0 0 0
1 48 256 12 3
2 64 64 14 1
3 80 51 7 1
4 96 128 5 3
5 112 73 5 2
6 128 32 4 1
7 144 0 0 0
8 160 0 0 0
9 176 0 0 0
10 192 0 0 0
11 208 0 0 0
12 224 0 0 0
13 240 0 0 0
14 256 16 1 1
15 272 15 9 1
16 288 0 0 0
17 304 0 0 0
18 320 0 0 0
19 336 0 0 0
20 352 0 0 0
21 368 0 0 0
22 384 0 0 0
23 400 0 0 0
24 416 0 0 0
25 432 0 0 0
26 448 0 0 0
27 464 0 0 0
28 480 0 0 0
29 496 33 1 4
30 512 0 0 0
31 528 0 0 0
32 544 0 0 0
33 560 0 0 0
34 576 0 0 0
35 592 0 0 0
36 608 0 0 0
37 624 0 0 0
38 640 0 0 0
40 672 0 0 0
42 704 0 0 0
43 720 17 1 3
44 736 0 0 0
46 768 0 0 0
49 816 0 0 0
51 848 0 0 0
52 864 14 1 3
54 896 0 0 0
57 944 13 1 3
58 960 0 0 0
62 1024 4 1 1
66 1088 15 2 4
67 1104 0 0 0
71 1168 0 0 0
74 1216 0 0 0
76 1248 0 0 0
83 1360 3 1 1
91 1488 11 1 4
94 1536 0 0 0
100 1632 5 1 2
107 1744 0 0 0
111 1808 9 1 4
126 2048 4 4 2
144 2336 7 3 4
151 2448 0 0 0
168 2720 15 15 10
190 3072 28 27 21
202 3264 0 0 0
254 4096 36209 36209 36209

Total 37022 36326 36288

We can calculate the overall fragentation by the last line:
Total 37022 36326 36288
(37022 - 36326) / 37022 = 1.87%

Also by analysing objects alocated in every class we know why we got so
low fragmentation: Most of the allocated objects is in . And
there is only 1 page in class 254 zspage. So, No fragmentation will be
introduced by allocating objs in class 254.

And in future, we can collect other zsmalloc statistics as we need and
analyse them.

Signed-off-by: Ganesh Mahendran
Suggested-by: Minchan Kim
Acked-by: Minchan Kim
Cc: Nitin Gupta
Cc: Seth Jennings
Cc: Dan Streetman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2015-02-13 10:54:12 +0800
3eba0c6a5 mm/zpool: add name argument to create zpool ... Browse Code »

Currently the underlay of zpool: zsmalloc/zbud, do not know who creates
them. There is not a method to let zsmalloc/zbud find which caller they
belong to.

Now we want to add statistics collection in zsmalloc. We need to name the
debugfs dir for each pool created. The way suggested by Minchan Kim is to
use a name passed by caller(such as zram) to create the zsmalloc pool.

/sys/kernel/debug/zsmalloc/zram0

This patch adds an argument `name' to zs_create_pool() and other related
functions.

Signed-off-by: Ganesh Mahendran
Acked-by: Minchan Kim
Cc: Seth Jennings
Cc: Nitin Gupta
Cc: Dan Streetman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2015-02-13 10:54:12 +0800

19 Dec, 2014

1 commit

66cdef663 mm/zsmalloc: adjust order of functions ... Browse Code »

Currently functions in zsmalloc.c does not arranged in a readable and
reasonable sequence. With the more and more functions added, we may
meet below inconvenience. For example:

Current functions:

void zs_init()
{
}

static void get_maxobj_per_zspage()
{
}

Then I want to add a func_1() which is called from zs_init(), and this
new added function func_1() will used get_maxobj_per_zspage() which is
defined below zs_init().

void func_1()
{
get_maxobj_per_zspage()
}

void zs_init()
{
func_1()
}

static void get_maxobj_per_zspage()
{
}

This will cause compiling issue. So we must add a declaration:

static void get_maxobj_per_zspage();

before func_1() if we do not put get_maxobj_per_zspage() before
func_1().

In addition, puting module_[init|exit] functions at the bottom of the
file conforms to our habit.

So, this patch ajusts function sequence as:

/* helper functions */
...
obj_location_to_handle()
...

/* Some exported functions */
...

zs_map_object()
zs_unmap_object()

zs_malloc()
zs_free()

zs_init()
zs_exit()

Signed-off-by: Ganesh Mahendran
Cc: Nitin Gupta
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2014-12-19 11:08:11 +0800

14 Dec, 2014

6 commits

181366561 mm/zsmalloc: allocate exactly size of struct zs_pool ... Browse Code »

In zs_create_pool(), we allocate memory more then sizeof(struct zs_pool)
ovhd_size = roundup(sizeof(*pool), PAGE_SIZE);

This patch allocate memory of exactly needed size.

Signed-off-by: Ganesh Mahendran
Acked-by: Minchan Kim
Cc: Nitin Gupta
Cc: Dan Streetman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2014-12-14 04:42:50 +0800
df8b5bb99 mm/zsmalloc: avoid duplicate assignment of prev_class ... Browse Code »

In zs_create_pool(), prev_class is assigned (ZS_SIZE_CLASSES - 1) times.
And the prev_class only references to the previous size_class. So we do
not need unnecessary assignement.

This patch assigns *prev_class* when a new size_class structure is
allocated and uses prev_class to check whether the first class has been
allocated.

[akpm@linux-foundation.org: remove now-unused ZS_SIZE_CLASSES]
Signed-off-by: Ganesh Mahendran
Cc: Minchan Kim
Cc: Nitin Gupta
Reviewed-by: Dan Streetman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2014-12-14 04:42:50 +0800
40f9fb8cf mm/zsmalloc: support allocating obj with size of ZS_MAX_ALLOC_SIZE ... Browse Code »

I sent a patch [1] for unnecessary check in zsmalloc. And Minchan Kim
found zsmalloc even does not support allocating an obj with the size of
ZS_MAX_ALLOC_SIZE in some situations.

For example:
In system with 64KB PAGE_SIZE and 32 bit of physical addr. Then:
ZS_MIN_ALLOC_SIZE is 32 bytes which is calculated by:
MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
ZS_MAX_ALLOC_SIZE is 64KB(in current code, is PAGE_SIZE)
ZS_SIZE_CLASS_DELTA is 256 bytes
So, ZS_SIZE_CLASSES = (ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZE) /
ZS_SIZE_CLASS_DELTA + 1
= 256

In zs_create_pool(), the max size obj which can be allocated will be:
ZS_MIN_ALLOC_SIZE + i * ZS_SIZE_CLASS_DELTA = 32 + 255*256 = 65312

We can see that 65312 < 65536 (ZS_MAX_ALLOC_SIZE). So we can NOT
allocate objs with size ZS_MAX_ALLOC_SIZE(65536) which we promise upper
users we can do.

[1] http://lkml.iu.edu/hypermail/linux/kernel/1411.2/03835.html
[2] http://lkml.iu.edu/hypermail/linux/kernel/1411.2/04534.html

This patch fixes this issue by dynamiclly calculating zs_size_classes when
module is loaded, allocates buffer with size ZS_MAX_ALLOC_SIZE. Then the
max obj(size is ZS_MAX_ALLOC_SIZE) can be stored in it.

[akpm@linux-foundation.org: restore ZS_SIZE_CLASSES to fix bisectability]
Signed-off-by: Mahendran Ganesh
Suggested-by: Minchan Kim
Cc: Nitin Gupta
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mahendran Ganesh
2014-12-14 04:42:50 +0800
af4ee5e97 zsmalloc: correct fragile [kmap|kunmap]_atomic use ... Browse Code »

The kunmap_atomic should use virtual address getting by kmap_atomic.
However, some pieces of code in zsmalloc uses modified address, not the
one got by kmap_atomic for kunmap_atomic.

It's okay for working because zsmalloc modifies the address inner
PAGE_SIZE bounday so it works with current kmap_atomic's implementation.
But it's still fragile with potential changing of kmap_atomic so let's
correct it.

I got a subtle bug when I implemented a new feature of zsmalloc
(compaction) due to a link's mishandling (the link was over page
boundary). Although it was totally my mistake, it took a while to find
the cause because an unpredictable kmapped address was unmapped causing an
almost random crash.

Signed-off-by: Minchan Kim
Cc: Nitin Gupta
Cc: Sergey Senozhatsky
Cc: Dan Streetman
Cc: Seth Jennings
Cc: Jerome Marchand
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2014-12-14 04:42:50 +0800
b1b00a5b8 zsmalloc: fix zs_init cpu notifier error handling ... Browse Code »

Mahendran Ganesh reported that zpool-enabled zsmalloc should not call
zpool_unregister_driver() from zs_init() if cpu notifier registration has
failed, because error handling is performed before we register the driver
via zpool_register_driver() call.

Factor out cpu notifier registration and unregistration code and fix
zs_init() error handling.

link: http://lkml.iu.edu//hypermail/linux/kernel/1411.1/04156.html
[akpm@linux-foundation.org: squash bogus gcc warning]
[akpm@linux-foundation.org: use __init and __exit]
Signed-off-by: Sergey Senozhatsky
Reported-by: Mahendran Ganesh
Cc: Minchan Kim
Cc: Nitin Gupta
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2014-12-14 04:42:50 +0800
9eec4cd53 zsmalloc: merge size_class to reduce fragmentation ... Browse Code »

zsmalloc has many size_classes to reduce fragmentation and they are in 16
bytes unit, for example, 16, 32, 48, etc., if PAGE_SIZE is 4096. And,
zsmalloc has constraint that each zspage has 4 pages at maximum.

In this situation, we can see interesting aspect. Let's think about
size_class for 1488, 1472, ..., 1376. To prevent external fragmentation,
they uses 4 pages per zspage and so all they can contain 11 objects at
maximum.

16384 (4096 * 4) = 1488 * 11 + remains
16384 (4096 * 4) = 1472 * 11 + remains
16384 (4096 * 4) = ...
16384 (4096 * 4) = 1376 * 11 + remains

It means that they have same characteristics and classification between
them isn't needed. If we use one size_class for them, we can reduce
fragementation and save some memory since both the 1488 and 1472 sized
classes can only fit 11 objects into 4 pages, and an object that's 1472
bytes can fit into an object that's 1488 bytes, merging these classes to
always use objects that are 1488 bytes will reduce the total number of
size classes. And reducing the total number of size classes reduces
overall fragmentation, because a wider range of compressed pages can fit
into a single size class, leaving less unused objects in each size class.

For this purpose, this patch implement size_class merging. If there is
size_class that have same pages_per_zspage and same number of objects per
zspage with previous size_class, we don't create new size_class. Instead,
we use previous, same characteristic size_class. With this way, above
example sizes (1488, 1472, ..., 1376) use just one size_class so we can
get much more memory utilization.

Below is result of my simple test.

TEST ENV: EXT4 on zram, mount with discard option WORKLOAD: untar kernel
source code, remove directory in descending order in size. (drivers arch
fs sound include net Documentation firmware kernel tools)

Each line represents orig_data_size, compr_data_size, mem_used_total,
fragmentation overhead (mem_used - compr_data_size) and overhead ratio
(overhead to compr_data_size), respectively, after untar and remove
operation is executed.

* untar-nomerge.out

orig_size compr_size used_size overhead overhead_ratio
525.88MB 199.16MB 210.23MB 11.08MB 5.56%
288.32MB 97.43MB 105.63MB 8.20MB 8.41%
177.32MB 61.12MB 69.40MB 8.28MB 13.55%
146.47MB 47.32MB 56.10MB 8.78MB 18.55%
124.16MB 38.85MB 48.41MB 9.55MB 24.58%
103.93MB 31.68MB 40.93MB 9.25MB 29.21%
84.34MB 22.86MB 32.72MB 9.86MB 43.13%
66.87MB 14.83MB 23.83MB 9.00MB 60.70%
60.67MB 11.11MB 18.60MB 7.49MB 67.48%
55.86MB 8.83MB 16.61MB 7.77MB 88.03%
53.32MB 8.01MB 15.32MB 7.31MB 91.24%

* untar-merge.out

orig_size compr_size used_size overhead overhead_ratio
526.23MB 199.18MB 209.81MB 10.64MB 5.34%
288.68MB 97.45MB 104.08MB 6.63MB 6.80%
177.68MB 61.14MB 66.93MB 5.79MB 9.47%
146.83MB 47.34MB 52.79MB 5.45MB 11.51%
124.52MB 38.87MB 44.30MB 5.43MB 13.96%
104.29MB 31.70MB 36.83MB 5.13MB 16.19%
84.70MB 22.88MB 27.92MB 5.04MB 22.04%
67.11MB 14.83MB 19.26MB 4.43MB 29.86%
60.82MB 11.10MB 14.90MB 3.79MB 34.17%
55.90MB 8.82MB 12.61MB 3.79MB 42.97%
53.32MB 8.01MB 11.73MB 3.73MB 46.53%

As you can see above result, merged one has better utilization (overhead
ratio, 5th column) and uses less memory (mem_used_total, 3rd column).

Signed-off-by: Joonsoo Kim
Cc: Minchan Kim
Cc: Nitin Gupta
Cc: Jerome Marchand
Cc: Sergey Senozhatsky
Reviewed-by: Dan Streetman
Cc: Luigi Semenzato
Cc:
Cc: "seungho1.park"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2014-12-14 04:42:49 +0800

10 Oct, 2014

4 commits

5538c5623 zsmalloc: simplify init_zspage free obj linking ... Browse Code »

Change zsmalloc init_zspage() logic to iterate through each object on each
of its pages, checking the offset to verify the object is on the current
page before linking it into the zspage.

The current zsmalloc init_zspage free object linking code has logic that
relies on there only being one page per zspage when PAGE_SIZE is a
multiple of class->size. It calculates the number of objects for the
current page, and iterates through all of them plus one, to account for
the assumed partial object at the end of the page. While this currently
works, the logic can be simplified to just link the object at each
successive offset until the offset is larger than PAGE_SIZE, which does
not rely on PAGE_SIZE being a multiple of class->size.

Signed-off-by: Dan Streetman
Acked-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: Nitin Gupta
Cc: Seth Jennings
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Streetman
2014-10-10 10:26:03 +0800
6dd9737e3 mm/zsmalloc.c: correct comment for fullness group computation ... Browse Code »

The letter 'f' in "n
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wang Sheng-Hui
2014-10-10 10:26:03 +0800
722cdc172 zsmalloc: change return value unit of zs_get_total_size_bytes ... Browse Code »

zs_get_total_size_bytes returns a amount of memory zsmalloc consumed with
*byte unit* but zsmalloc operates *page unit* rather than byte unit so
let's change the API so benefit we could get is that reduce unnecessary
overhead (ie, change page unit with byte unit) in zsmalloc.

Since return type is pages, "zs_get_total_pages" is better than
"zs_get_total_size_bytes".

Signed-off-by: Minchan Kim
Reviewed-by: Dan Streetman
Cc: Sergey Senozhatsky
Cc: Jerome Marchand
Cc:
Cc:
Cc: Luigi Semenzato
Cc: Nitin Gupta
Cc: Seth Jennings
Cc: David Horner
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2014-10-10 10:26:02 +0800
13de8933c zsmalloc: move pages_allocated to zs_pool ... Browse Code »

Currently, zram has no feature to limit memory so theoretically zram can
deplete system memory. Users have asked for a limit several times as even
without exhaustion zram makes it hard to control memory usage of the
platform. This patchset adds the feature.

Patch 1 makes zs_get_total_size_bytes faster because it would be used
frequently in later patches for the new feature.

Patch 2 changes zs_get_total_size_bytes's return unit from bytes to page
so that zsmalloc doesn't need unnecessary operation(ie, << PAGE_SHIFT).

Patch 3 adds new feature. I added the feature into zram layer, not
zsmalloc because limiation is zram's requirement, not zsmalloc so any
other user using zsmalloc(ie, zpool) shouldn't affected by unnecessary
branch of zsmalloc. In future, if every users of zsmalloc want the
feature, then, we could move the feature from client side to zsmalloc
easily but vice versa would be painful.

Patch 4 adds news facility to report maximum memory usage of zram so that
this avoids user polling frequently via /sys/block/zram0/ mem_used_total
and ensures transient max are not missed.

This patch (of 4):

pages_allocated has counted in size_class structure and when user of
zsmalloc want to see total_size_bytes, it should gather all of count from
each size_class to report the sum.

It's not bad if user don't see the value often but if user start to see
the value frequently, it would be not a good deal for performance pov.

This patch moves the count from size_class to zs_pool so it could reduce
memory footprint (from [255 * 8byte] to [sizeof(atomic_long_t)]).

Signed-off-by: Minchan Kim
Reviewed-by: Dan Streetman
Cc: Sergey Senozhatsky
Cc: Jerome Marchand
Cc:
Cc:
Cc: Luigi Semenzato
Cc: Nitin Gupta
Cc: Seth Jennings
Reviewed-by: David Horner
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2014-10-10 10:26:02 +0800

30 Aug, 2014

1 commit

137f8cff5 mm/zpool: use prefixed module loading ... Browse Code »

To avoid potential format string expansion via module parameters, do not
use the zpool type directly in request_module() without a format string.
Additionally, to avoid arbitrary modules being loaded via zpool API
(e.g. via the zswap_zpool_type module parameter) add a "zpool-" prefix
to the requested module, as well as module aliases for the existing
zpool types (zbud and zsmalloc).

Signed-off-by: Kees Cook
Cc: Seth Jennings
Cc: Minchan Kim
Cc: Nitin Gupta
Acked-by: Dan Streetman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2014-08-30 07:28:16 +0800

07 Aug, 2014

3 commits

c795779df mm/zpool: zbud/zsmalloc implement zpool ... Browse Code »

Update zbud and zsmalloc to implement the zpool api.

[fengguang.wu@intel.com: make functions static]
Signed-off-by: Dan Streetman
Tested-by: Seth Jennings
Cc: Minchan Kim
Cc: Nitin Gupta
Cc: Weijie Yang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Streetman
2014-08-07 09:01:23 +0800
af8d417a0 mm/zpool: implement common zpool api to zbud/zsmalloc ... Browse Code »

Add zpool api.

zpool provides an interface for memory storage, typically of compressed
memory. Users can select what backend to use; currently the only
implementations are zbud, a low density implementation with up to two
compressed pages per storage page, and zsmalloc, a higher density
implementation with multiple compressed pages per storage page.

Signed-off-by: Dan Streetman
Tested-by: Seth Jennings
Cc: Minchan Kim
Cc: Nitin Gupta
Cc: Weijie Yang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Streetman
2014-08-07 09:01:23 +0800
f6f8ed473 mm/vmalloc.c: clean up map_vm_area third argument ... Browse Code »

Currently map_vm_area() takes (struct page *** pages) as third argument,
and after mapping, it moves (*pages) to point to (*pages +
nr_mappped_pages).

It looks like this kind of increment is useless to its caller these
days. The callers don't care about the increments and actually they're
trying to avoid this by passing another copy to map_vm_area().

The caller can always guarantee all the pages can be mapped into vm_area
as specified in first argument and the caller only cares about whether
map_vm_area() fails or not.

This patch cleans up the pointer movement in map_vm_area() and updates
its callers accordingly.

Signed-off-by: WANG Chao
Cc: Zhang Yanfei
Acked-by: Greg Kroah-Hartman
Cc: Minchan Kim
Cc: Nitin Gupta
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

WANG Chao
2014-08-07 09:01:19 +0800

05 Jun, 2014

2 commits

7eb52512a zsmalloc: fixup trivial zs size classes value in comments ... Browse Code »

According to calculation, ZS_SIZE_CLASSES value is 255 on systems with 4K
page size, not 254. The old value may forget count the ZS_MIN_ALLOC_SIZE
in.

This patch fixes this trivial issue in the comments.

Signed-off-by: Weijie Yang
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Weijie Yang
2014-06-05 07:54:13 +0800
7c8e0181e mm: replace __get_cpu_var uses with this_cpu_ptr ... Browse Code »

Replace places where __get_cpu_var() is used for an address calculation
with this_cpu_ptr().

Signed-off-by: Christoph Lameter
Cc: Tejun Heo
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2014-06-05 07:54:03 +0800

20 Mar, 2014

1 commit

f0e71fcd0 zsmalloc: Fix CPU hotplug callback registration ... Browse Code »

Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

get_online_cpus();

for_each_online_cpu(cpu)
init_cpu(cpu);

register_cpu_notifier(&foobar_cpu_notifier);

put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Instead, the correct and race-free way of performing the callback
registration is:

cpu_notifier_register_begin();

for_each_online_cpu(cpu)
init_cpu(cpu);

/* Note the use of the double underscored version of the API */
__register_cpu_notifier(&foobar_cpu_notifier);

cpu_notifier_register_done();

Fix the zsmalloc code by using this latter form of callback registration.

Cc: Nitin Gupta
Cc: Ingo Molnar
Signed-off-by: Srivatsa S. Bhat
Acked-by: Minchan Kim
Signed-off-by: Rafael J. Wysocki

Srivatsa S. Bhat
2014-03-20 20:43:45 +0800

31 Jan, 2014

2 commits

31fc00bb7 zsmalloc: add copyright ... Browse Code »

Add my copyright to the zsmalloc source code which I maintain.

Signed-off-by: Minchan Kim
Cc: Nitin Gupta
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2014-01-31 08:56:55 +0800
bcf1647d0 zsmalloc: move it under mm ... Browse Code »

This patch moves zsmalloc under mm directory.

Before that, description will explain why we have needed custom
allocator.

Zsmalloc is a new slab-based memory allocator for storing compressed
pages. It is designed for low fragmentation and high allocation success
rate on large object, but
Acked-by: Nitin Gupta
Reviewed-by: Konrad Rzeszutek Wilk
Cc: Bob Liu
Cc: Greg Kroah-Hartman
Cc: Hugh Dickins
Cc: Jens Axboe
Cc: Luigi Semenzato
Cc: Mel Gorman
Cc: Pekka Enberg
Cc: Rik van Riel
Cc: Seth Jennings
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2014-01-31 08:56:55 +0800