Eric Lee / smarc-fsl-linux-kernel

29 Jul, 2016

8 commits

c3491eca3 zsmalloc: Delete an unnecessary check before the function call "iput" ... Browse Code »

iput() tests whether its argument is NULL and then returns immediately.
Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Link: http://lkml.kernel.org/r/559cf499-4a01-25f9-c87f-24d906626a57@users.sourceforge.net
Signed-off-by: Markus Elfring
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Markus Elfring
2016-07-29 07:07:41 +0800
18fd06bf7 mm/zsmalloc: use helper to clear page->flags bit ... Browse Code »

Use ClearPagePrivate/ClearPagePrivate2 helpers to clear
PG_private/PG_private_2 in page->flags

Link: http://lkml.kernel.org/r/1467882338-4300-7-git-send-email-opensource.ganesh@gmail.com
Signed-off-by: Ganesh Mahendran
Acked-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2016-07-29 07:07:41 +0800
35b3445e9 mm/zsmalloc: add __init,__exit attribute ... Browse Code »

Add __init,__exit attribute for function that only called in module
init/exit to save memory.

Link: http://lkml.kernel.org/r/1467882338-4300-6-git-send-email-opensource.ganesh@gmail.com
Signed-off-by: Ganesh Mahendran
Cc: Sergey Senozhatsky
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2016-07-29 07:07:41 +0800
fd8544639 mm/zsmalloc: keep comments consistent with code ... Browse Code »

Some minor commebnt changes:

1). update zs_malloc(),zs_create_pool() function header
2). update "Usage of struct page fields"

Link: http://lkml.kernel.org/r/1467882338-4300-5-git-send-email-opensource.ganesh@gmail.com
Signed-off-by: Ganesh Mahendran
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2016-07-29 07:07:41 +0800
64d90465f mm/zsmalloc: avoid calculate max objects of zspage twice ... Browse Code »

Currently, if a class can not be merged, the max objects of zspage in
that class may be calculated twice.

This patch calculate max objects of zspage at the begin, and pass the
value to can_merge() to decide whether the class can be merged.

Also this patch remove function get_maxobj_per_zspage(), as there is no
other place to call this function.

Link: http://lkml.kernel.org/r/1467882338-4300-4-git-send-email-opensource.ganesh@gmail.com
Signed-off-by: Ganesh Mahendran
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2016-07-29 07:07:41 +0800
b4fd07a08 mm/zsmalloc: use class->objs_per_zspage to get num of max objects ... Browse Code »

num of max objects in zspage is stored in each size_class now. So there
is no need to re-calculate it.

Link: http://lkml.kernel.org/r/1467882338-4300-3-git-send-email-opensource.ganesh@gmail.com
Signed-off-by: Ganesh Mahendran
Acked-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2016-07-29 07:07:41 +0800
cf675acb7 mm/zsmalloc: take obj index back from find_alloced_obj ... Browse Code »

the obj index value should be updated after return from
find_alloced_obj() to avoid CPU burning caused by unnecessary object
scanning.

Link: http://lkml.kernel.org/r/1467882338-4300-2-git-send-email-opensource.ganesh@gmail.com
Signed-off-by: Ganesh Mahendran
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2016-07-29 07:07:41 +0800
41b88e14c mm/zsmalloc: use obj_index to keep consistent with others ... Browse Code »

This is a cleanup patch. Change "index" to "obj_index" to keep
consistent with others in zsmalloc.

Link: http://lkml.kernel.org/r/1467882338-4300-1-git-send-email-opensource.ganesh@gmail.com
Signed-off-by: Ganesh Mahendran
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2016-07-29 07:07:41 +0800

27 Jul, 2016

11 commits

dd4123f32 mm: fix build warnings in <linux/compaction.h> ... Browse Code »

Randy reported below build error.

> In file included from ../include/linux/balloon_compaction.h:48:0,
> from ../mm/balloon_compaction.c:11:
> ../include/linux/compaction.h:237:51: warning: 'struct node' declared inside parameter list [enabled by default]
> static inline int compaction_register_node(struct node *node)
> ../include/linux/compaction.h:237:51: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
> ../include/linux/compaction.h:242:54: warning: 'struct node' declared inside parameter list [enabled by default]
> static inline void compaction_unregister_node(struct node *node)
>

It was caused by non-lru page migration which needs compaction.h but
compaction.h doesn't include any header to be standalone.

I think proper header for non-lru page migration is migrate.h rather
than compaction.h because migrate.h has already headers needed to work
non-lru page migration indirectly like isolate_mode_t, migrate_mode
MIGRATEPAGE_SUCCESS.

[akpm@linux-foundation.org: revert mm-balloon-use-general-non-lru-movable-page-feature-fix.patch temp fix]
Link: http://lkml.kernel.org/r/20160610003304.GE29779@bbox
Signed-off-by: Minchan Kim
Reported-by: Randy Dunlap
Cc: Konstantin Khlebnikov
Cc: Vlastimil Babka
Cc: Gioh Kim
Cc: Rafael Aquini
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
91537fee0 mm: add NR_ZSMALLOC to vmstat ... Browse Code »

zram is very popular for some of the embedded world (e.g., TV, mobile
phones). On those system, zsmalloc's consumed memory size is never
trivial (one of example from real product system, total memory: 800M,
zsmalloc consumed: 150M), so we have used this out of tree patch to
monitor system memory behavior via /proc/vmstat.

With zsmalloc in vmstat, it helps in tracking down system behavior due
to memory usage.

[minchan@kernel.org: zsmalloc: follow up zsmalloc vmstat]
Link: http://lkml.kernel.org/r/20160607091737.GC23435@bbox
[akpm@linux-foundation.org: fix build with CONFIG_ZSMALLOC=m]
Link: http://lkml.kernel.org/r/1464919731-13255-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Sangseok Lee
Cc: Chanho Min
Cc: Chan Gyun Jeong
Cc: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
3b1d9ca65 zsmalloc: use OBJ_TAG_BIT for bit shifter ... Browse Code »

Static check warns using tag as bit shifter. It doesn't break current
working but not good for redability. Let's use OBJ_TAG_BIT as bit
shifter instead of OBJ_ALLOCATED_TAG.

Link: http://lkml.kernel.org/r/20160607045146.GF26230@bbox
Signed-off-by: Minchan Kim
Reported-by: Dan Carpenter
Cc: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
48b4800a1 zsmalloc: page migration support ... Browse Code »

This patch introduces run-time migration feature for zspage.

For migration, VM uses page.lru field so it would be better to not use
page.next field which is unified with page.lru for own purpose. For
that, firstly, we can get first object offset of the page via runtime
calculation instead of using page.index so we can use page.index as link
for page chaining instead of page.next.

In case of huge object, it stores handle to page.index instead of next
link of page chaining because huge object doesn't need to next link for
page chaining. So get_next_page need to identify huge object to return
NULL. For it, this patch uses PG_owner_priv_1 flag of the page flag.

For migration, it supports three functions

* zs_page_isolate

It isolates a zspage which includes a subpage VM want to migrate from
class so anyone cannot allocate new object from the zspage.

We could try to isolate a zspage by the number of subpage so subsequent
isolation trial of other subpage of the zpsage shouldn't fail. For
that, we introduce zspage.isolated count. With that, zs_page_isolate
can know whether zspage is already isolated or not for migration so if
it is isolated for migration, subsequent isolation trial can be
successful without trying further isolation.

* zs_page_migrate

First of all, it holds write-side zspage->lock to prevent migrate other
subpage in zspage. Then, lock all objects in the page VM want to
migrate. The reason we should lock all objects in the page is due to
race between zs_map_object and zs_page_migrate.

zs_map_object zs_page_migrate

pin_tag(handle)
obj = handle_to_obj(handle)
obj_to_location(obj, &page, &obj_idx);

write_lock(&zspage->lock)
if (!trypin_tag(handle))
goto unpin_object

zspage = get_zspage(page);
read_lock(&zspage->lock);

If zs_page_migrate doesn't do trypin_tag, zs_map_object's page can be
stale by migration so it goes crash.

If it locks all of objects successfully, it copies content from old page
to new one, finally, create new zspage chain with new page. And if it's
last isolated subpage in the zspage, put the zspage back to class.

* zs_page_putback

It returns isolated zspage to right fullness_group list if it fails to
migrate a page. If it find a zspage is ZS_EMPTY, it queues zspage
freeing to workqueue. See below about async zspage freeing.

This patch introduces asynchronous zspage free. The reason to need it
is we need page_lock to clear PG_movable but unfortunately, zs_free path
should be atomic so the apporach is try to grab page_lock. If it got
page_lock of all of pages successfully, it can free zspage immediately.
Otherwise, it queues free request and free zspage via workqueue in
process context.

If zs_free finds the zspage is isolated when it try to free zspage, it
delays the freeing until zs_page_putback finds it so it will free free
the zspage finally.

In this patch, we expand fullness_list from ZS_EMPTY to ZS_FULL. First
of all, it will use ZS_EMPTY list for delay freeing. And with adding
ZS_FULL list, it makes to identify whether zspage is isolated or not via
list_empty(&zspage->list) test.

[minchan@kernel.org: zsmalloc: keep first object offset in struct page]
Link: http://lkml.kernel.org/r/1465788015-23195-1-git-send-email-minchan@kernel.org
[minchan@kernel.org: zsmalloc: zspage sanity check]
Link: http://lkml.kernel.org/r/20160603010129.GC3304@bbox
Link: http://lkml.kernel.org/r/1464736881-24886-12-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
bfd093f5e zsmalloc: use freeobj for index ... Browse Code »

Zsmalloc stores first free object's position into freeobj
in each zspage. If we change it with index from first_page instead of
position, it makes page migration simple because we don't need to
correct other entries for linked list if a page is migrated out.

Link: http://lkml.kernel.org/r/1464736881-24886-11-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
4aa409cab zsmalloc: separate free_zspage from putback_zspage ... Browse Code »

Currently, putback_zspage does free zspage under class->lock if fullness
become ZS_EMPTY but it makes trouble to implement locking scheme for new
zspage migration. So, this patch is to separate free_zspage from
putback_zspage and free zspage out of class->lock which is preparation
for zspage migration.

Link: http://lkml.kernel.org/r/1464736881-24886-10-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
3783689a1 zsmalloc: introduce zspage structure ... Browse Code »

We have squeezed meta data of zspage into first page's descriptor. So,
to get meta data from subpage, we should get first page first of all.
But it makes trouble to implment page migration feature of zsmalloc
because any place where to get first page from subpage can be raced with
first page migration. IOW, first page it got could be stale. For
preventing it, I have tried several approahces but it made code
complicated so finally, I concluded to separate metadata from first
page. Of course, it consumes more memory. IOW, 16bytes per zspage on
32bit at the moment. It means we lost 1% at *worst case*(40B/4096B)
which is not bad I think at the cost of maintenance.

Link: http://lkml.kernel.org/r/1464736881-24886-9-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
bdb0af7ca zsmalloc: factor page chain functionality out ... Browse Code »

For page migration, we need to create page chain of zspage dynamically
so this patch factors it out from alloc_zspage.

Link: http://lkml.kernel.org/r/1464736881-24886-8-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
4f42047bb zsmalloc: use accessor ... Browse Code »

Upcoming patch will change how to encode zspage meta so for easy review,
this patch wraps code to access metadata as accessor.

Link: http://lkml.kernel.org/r/1464736881-24886-7-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
1b8320b62 zsmalloc: use bit_spin_lock ... Browse Code »

Use kernel standard bit spin-lock instead of custom mess. Even, it has
a bug which doesn't disable preemption. The reason we don't have any
problem is that we have used it during preemption disable section by
class->lock spinlock. So no need to go to stable.

Link: http://lkml.kernel.org/r/1464736881-24886-6-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
1fc6e27d7 zsmalloc: keep max_object in size_class ... Browse Code »

Every zspage in a size_class has same number of max objects so we could
move it to a size_class.

Link: http://lkml.kernel.org/r/1464736881-24886-5-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800

27 May, 2016

1 commit

4abaac9b7 update "mm/zsmalloc: don't fail if can't create debugfs info" ... Browse Code »

Some updates to commit d34f615720d1 ("mm/zsmalloc: don't fail if can't
create debugfs info"):

- add pr_warn to all stat failure cases
- do not prevent module loading on stat failure

Link: http://lkml.kernel.org/r/1463671123-5479-1-git-send-email-ddstreet@ieee.org
Signed-off-by: Dan Streetman
Reviewed-by: Ganesh Mahendran
Acked-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: Dan Streetman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Streetman
2016-05-27 06:35:44 +0800

21 May, 2016

6 commits

d34f61572 mm/zsmalloc: don't fail if can't create debugfs info ... Browse Code »

Change the return type of zs_pool_stat_create() to void, and remove the
logic to abort pool creation if the stat debugfs dir/file could not be
created.

The debugfs stat file is for debugging/information only, and doesn't
affect operation of zsmalloc; there is no reason to abort creating the
pool if the stat file can't be created. This was seen with zswap, which
used the same name for all pool creations, which caused zsmalloc to fail
to create a second pool for zswap if CONFIG_ZSMALLOC_STAT was enabled.

Signed-off-by: Dan Streetman
Reviewed-by: Sergey Senozhatsky
Cc: Dan Streetman
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Streetman
2016-05-21 08:58:30 +0800
d0d8da2dc zsmalloc: require GFP in zs_malloc() ... Browse Code »

Pass GFP flags to zs_malloc() instead of using a fixed mask supplied to
zs_create_pool(), so we can be more flexible, but, more importantly, we
need this to switch zram to per-cpu compression streams -- zram will try
to allocate handle with preemption disabled in a fast path and switch to
a slow path (using different gfp mask) if the fast one has failed.

Apart from that, this also align zs_malloc() interface with zspool/zbud.

[sergey.senozhatsky@gmail.com: pass GFP flags to zs_malloc() instead of using a fixed mask]
Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
Signed-off-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2016-05-21 08:58:30 +0800
1ee471658 zsmalloc: remove unused pool param in obj_free ... Browse Code »

Let's remove unused pool param in obj_free

Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-05-21 08:58:30 +0800
251cbb951 zsmalloc: reorder function parameters ... Browse Code »

Clean up function parameter ordering to order higher data structure
first.

Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-05-21 08:58:30 +0800
830e4bc5b zsmalloc: clean up many BUG_ON ... Browse Code »

There are many BUG_ON in zsmalloc.c which is not recommened so change
them as alternatives.

Normal rule is as follows:

1. avoid BUG_ON if possible. Instead, use VM_BUG_ON or VM_BUG_ON_PAGE

2. use VM_BUG_ON_PAGE if we need to see struct page's fields

3. use those assertion in primitive functions so higher functions can
rely on the assertion in the primitive function.

4. Don't use assertion if following instruction can trigger Oops

Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-05-21 08:58:30 +0800
a42094676 zsmalloc: use first_page rather than page ... Browse Code »

Clean up function parameter "struct page". Many functions of zsmalloc
expect that page paramter is "first_page" so use "first_page" rather
than "page" for code readability.

Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-05-21 08:58:30 +0800

10 May, 2016

1 commit

44f43e99f zsmalloc: fix zs_can_compact() integer overflow ... Browse Code »

zs_can_compact() has two race conditions in its core calculation:

unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
zs_stat_get(class, OBJ_USED);

1) classes are not locked, so the numbers of allocated and used
objects can change by the concurrent ops happening on other CPUs
2) shrinker invokes it from preemptible context

Depending on the circumstances, thus, OBJ_ALLOCATED can become
less than OBJ_USED, which can result in either very high or
negative `total_scan' value calculated later in do_shrink_slab().

do_shrink_slab() has some logic to prevent those cases:

vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62

However, due to the way `total_scan' is calculated, not every
shrinker->count_objects() overflow can be spotted and handled.
To demonstrate the latter, I added some debugging code to do_shrink_slab()
(x86_64) and the results were:

vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
vmscan: but total_scan > 0: 92679974445502
vmscan: resulting total_scan: 92679974445502
[..]
vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
vmscan: but total_scan > 0: 22634041808232578
vmscan: resulting total_scan: 22634041808232578

Even though shrinker->count_objects() has returned an overflowed value,
the resulting `total_scan' is positive, and, what is more worrisome, it
is insanely huge. This value is getting used later on in
shrinker->scan_objects() loop:

while (total_scan >= batch_size ||
total_scan >= freeable) {
unsigned long ret;
unsigned long nr_to_scan = min(batch_size, total_scan);

shrinkctl->nr_to_scan = nr_to_scan;
ret = shrinker->scan_objects(shrinker, shrinkctl);
if (ret == SHRINK_STOP)
break;
freed += ret;

count_vm_events(SLABS_SCANNED, nr_to_scan);
total_scan -= nr_to_scan;

cond_resched();
}

`total_scan >= batch_size' is true for a very-very long time and
'total_scan >= freeable' is also true for quite some time, because
`freeable < 0' and `total_scan' is large enough, for example,
22634041808232578. The only break condition, in the given scheme of
things, is shrinker->scan_objects() == SHRINK_STOP test, which is a
bit too weak to rely on, especially in heavy zsmalloc-usage scenarios.

To fix the issue, take a pool stat snapshot and use it instead of
racy zs_stat_get() calls.

Link: http://lkml.kernel.org/r/20160509140052.3389-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky
Cc: Minchan Kim
Cc: [4.3+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2016-05-10 08:40:59 +0800

18 Mar, 2016

2 commits

1120ed548 mm/zsmalloc: add `freeable' column to pool stat ... Browse Code »

Add a new column to pool stats, which will tell how many pages ideally
can be freed by class compaction, so it will be easier to analyze
zsmalloc fragmentation.

At the moment, we have only numbers of FULL and ALMOST_EMPTY classes,
but they don't tell us how badly the class is fragmented internally.

The new /sys/kernel/debug/zsmalloc/zramX/classes output look as follows:

class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
[..]
12 224 0 2 146 5 8 4 4
13 240 0 0 0 0 0 1 0
14 256 1 13 1840 1672 115 1 10
15 272 0 0 0 0 0 1 0
[..]
49 816 0 3 745 735 149 1 2
51 848 3 4 361 306 76 4 8
52 864 12 14 378 268 81 3 21
54 896 1 12 117 57 26 2 12
57 944 0 0 0 0 0 3 0
[..]
Total 26 131 12709 10994 1071 134

For example, from this particular output we can easily conclude that
class-896 is heavily fragmented -- it occupies 26 pages, 12 can be freed
by compaction.

Signed-off-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2016-03-18 06:09:34 +0800
a82cbf071 zsmalloc: drop unused member 'mapping_area->huge' ... Browse Code »

When unmapping a huge class page in zs_unmap_object, the page will be
unmapped by kmap_atomic. the "!area->huge" branch in __zs_unmap_object
is alway true, and no code set "area->huge" now, so we can drop it.

Signed-off-by: YiPing Xu
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

YiPing Xu
2016-03-18 06:09:34 +0800

21 Jan, 2016

1 commit

c102f07ca zsmalloc: fix migrate_zspage-zs_free race condition ... Browse Code »

record_obj() in migrate_zspage() does not preserve handle's
HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
(accidentally) un-pins the handle, while migrate_zspage() still performs
an explicit unpin_tag() on the that handle. This additional explicit
unpin_tag() introduces a race condition with zs_free(), which can pin
that handle by this time, so the handle becomes un-pinned.

Schematically, it goes like this:

CPU0 CPU1
migrate_zspage
find_alloced_obj
trypin_tag
set HANDLE_PIN_BIT zs_free()
pin_tag()
obj_malloc() -- new object, no tag
record_obj() -- remove HANDLE_PIN_BIT set HANDLE_PIN_BIT
unpin_tag() -- remove zs_free's HANDLE_PIN_BIT

The race condition may result in a NULL pointer dereference:

Unable to handle kernel NULL pointer dereference at virtual address 00000000
CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
PC is at get_zspage_mapping+0x0/0x24
LR is at obj_free.isra.22+0x64/0x128
Call trace:
get_zspage_mapping+0x0/0x24
zs_free+0x88/0x114
zram_free_page+0x64/0xcc
zram_slot_free_notify+0x90/0x108
swap_entry_free+0x278/0x294
free_swap_and_cache+0x38/0x11c
unmap_single_vma+0x480/0x5c8
unmap_vmas+0x44/0x60
exit_mmap+0x50/0x110
mmput+0x58/0xe0
do_exit+0x320/0x8dc
do_group_exit+0x44/0xa8
get_signal+0x538/0x580
do_signal+0x98/0x4b8
do_notify_resume+0x14/0x5c

This patch keeps the lock bit in migration path and update value
atomically.

Signed-off-by: Junil Lee
Signed-off-by: Minchan Kim
Acked-by: Vlastimil Babka
Cc: Sergey Senozhatsky
Cc: [4.1+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Junil Lee
2016-01-21 09:09:18 +0800

16 Jan, 2016

1 commit

7dfa46122 zsmalloc: reorganize struct size_class to pack 4 bytes hole ... Browse Code »

Reoder the pages_per_zspage field in struct size_class which can
eliminate the 4 bytes hole between it and stats field.

Signed-off-by: Weijie Yang
Reviewed-by: Sergey Senozhatsky
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Weijie Yang
2016-01-16 03:40:52 +0800

07 Nov, 2015

8 commits

32e7ba1ea zsmalloc: use page->private instead of page->first_page ... Browse Code »

We are going to rework how compound_head() work. It will not use
page->first_page as we have it now.

The only other user of page->first_page beyond compound pages is
zsmalloc.

Let's use page->private instead of page->first_page here. It occupies
the same storage space.

Signed-off-by: Kirill A. Shutemov
Acked-by: Vlastimil Babka
Reviewed-by: Sergey Senozhatsky
Reviewed-by: Andrea Arcangeli
Cc: "Paul E. McKenney"
Cc: Andi Kleen
Cc: Aneesh Kumar K.V
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Joonsoo Kim
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2015-11-07 09:50:42 +0800
6fe5186f0 zsmalloc: reduce size_class memory usage ... Browse Code »

Each `struct size_class' contains `struct zs_size_stat': an array of
NR_ZS_STAT_TYPE `unsigned long'. For zsmalloc built with no
CONFIG_ZSMALLOC_STAT this results in a waste of `2 * sizeof(unsigned
long)' per-class.

The patch removes unneeded `struct zs_size_stat' members by redefining
NR_ZS_STAT_TYPE (max stat idx in array).

Since both NR_ZS_STAT_TYPE and zs_stat_type are compile time constants,
GCC can eliminate zs_stat_inc()/zs_stat_dec() calls that use zs_stat_type
larger than NR_ZS_STAT_TYPE: CLASS_ALMOST_EMPTY and CLASS_ALMOST_FULL at
the moment.

./scripts/bloat-o-meter mm/zsmalloc.o.old mm/zsmalloc.o.new
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-39 (-39)
function old new delta
fix_fullness_group 97 94 -3
insert_zspage 100 86 -14
remove_zspage 141 119 -22

To summarize:
a) each class now uses less memory
b) we avoid a number of dec/inc stats (a minor optimization,
but still).

The gain will increase once we introduce additional stats.

A simple IO test.

iozone -t 4 -R -r 32K -s 60M -I +Z
patched base
" Initial write " 4145599.06 4127509.75
" Rewrite " 4146225.94 4223618.50
" Read " 17157606.00 17211329.50
" Re-read " 17380428.00 17267650.50
" Reverse Read " 16742768.00 16162732.75
" Stride read " 16586245.75 16073934.25
" Random read " 16349587.50 15799401.75
" Mixed workload " 10344230.62 9775551.50
" Random write " 4277700.62 4260019.69
" Pwrite " 4302049.12 4313703.88
" Pread " 6164463.16 6126536.72
" Fwrite " 7131195.00 6952586.00
" Fread " 12682602.25 12619207.50

Signed-off-by: Sergey Senozhatsky
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2015-11-07 09:50:42 +0800
6f0b22760 mm/zsmalloc.c: remove useless line in obj_free() ... Browse Code »

Signed-off-by: Hui Zhu
Reviewed-by: Sergey Senozhatsky
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hui Zhu
2015-11-07 09:50:42 +0800
2c3516957 zsmalloc: don't test shrinker_enabled in zs_shrinker_count() ... Browse Code »

We don't let user to disable shrinker in zsmalloc (once it's been
enabled), so no need to check ->shrinker_enabled in zs_shrinker_count(),
at the moment at least.

Signed-off-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2015-11-07 09:50:42 +0800
759b26b29 zsmalloc: use preempt.h for in_interrupt() ... Browse Code »

A cosmetic change.

Commit c60369f01125 ("staging: zsmalloc: prevent mappping in interrupt
context") added in_interrupt() check to zs_map_object() and 'hardirq.h'
include; but in_interrupt() macro is defined in 'preempt.h' not in
'hardirq.h', so include it instead.

Signed-off-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2015-11-07 09:50:42 +0800
12a7bfad5 zsmalloc: fix obj_to_head use page_private(page) as value but not pointer ... Browse Code »

In obj_malloc():

if (!class->huge)
/* record handle in the header of allocated chunk */
link->handle = handle;
else
/* record handle in first_page->private */
set_page_private(first_page, handle);

In the hugepage we save handle to private directly.

But in obj_to_head():

if (class->huge) {
VM_BUG_ON(!is_first_page(page));
return *(unsigned long *)page_private(page);
} else
return *(unsigned long *)obj;

It is used as a pointer.

The reason why there is no problem until now is huge-class page is born
with ZS_FULL so it can't be migrated. However, we need this patch for
future work: "VM-aware zsmalloced page migration" to reduce external
fragmentation.

Signed-off-by: Hui Zhu
Acked-by: Minchan Kim
Cc: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hui Zhu
2015-11-07 09:50:42 +0800
8f958c98f zsmalloc: add comments for ->inuse to zspage ... Browse Code »

[akpm@linux-foundation.org: fix grammar]
Signed-off-by: Hui Zhu
Reviewed-by: Sergey Senozhatsky
Cc: Dan Streetman
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hui Zhu
2015-11-07 09:50:42 +0800
6f3526d6d mm: zsmalloc: constify struct zs_pool name ... Browse Code »

Constify `struct zs_pool' ->name.

[akpm@inux-foundation.org: constify zpool_create_pool()'s `type' arg also]
Signed-off-by: Sergey Senozhatsky
Acked-by: Dan Streetman
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey SENOZHATSKY
2015-11-07 09:50:42 +0800

09 Sep, 2015

1 commit

786727799 mm: zpool: constify the zpool_ops ... Browse Code »

The structure zpool_ops is not modified so make the pointer to it a
pointer to const.

Signed-off-by: Krzysztof Kozlowski
Acked-by: Dan Streetman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Krzysztof Kozlowski
2015-09-09 06:35:28 +0800