Eric Lee / smarc-fsl-linux-kernel

14 Dec, 2017

1 commit

123833408 zsmalloc: calling zs_map_object() from irq is a bug ... Browse Code »
20

[ Upstream commit 1aedcafbf32b3f232c159b14cd0d423fcfe2b861 ]

Use BUG_ON(in_interrupt()) in zs_map_object(). This is not a new
BUG_ON(), it's always been there, but was recently changed to
VM_BUG_ON(). There are several problems there. First, we use use
per-CPU mappings both in zsmalloc and in zram, and interrupt may easily
corrupt those buffers. Second, and more importantly, we believe it's
possible to start leaking sensitive information. Consider the following
case:

-> process P
swap out
zram
per-cpu mapping CPU1
compress page A
-> IRQ

swap out
zram
per-cpu mapping CPU1
compress page B
write page from per-cpu mapping CPU1 to zsmalloc pool
iret

-> process P
write page from per-cpu mapping CPU1 to zsmalloc pool [*]
return

* so we store overwritten data that actually belongs to another
page (task) and potentially contains sensitive data. And when
process P will page fault it's going to read (swap in) that
other task's data.

Link: http://lkml.kernel.org/r/20170929045140.4055-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Sasha Levin
Signed-off-by: Greg Kroah-Hartman

Sergey Senozhatsky
2017-12-14 16:53:10 +0800

09 Sep, 2017

2 commits

3eb95feac mm/zsmalloc.c: change stat type parameter to int ... Browse Code »

zs_stat_inc/dec/get() uses enum zs_stat_type for the stat type, however
some callers pass an enum fullness_group value. Change the type to int to
reflect the actual use of the functions and get rid of 'enum-conversion'
warnings

Link: http://lkml.kernel.org/r/20170731175000.56538-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Cc: Doug Anderson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthias Kaehlcke
2017-09-09 09:26:47 +0800
2916ecc0f mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY ... Browse Code »

Introduce a new migration mode that allow to offload the copy to a device
DMA engine. This changes the workflow of migration and not all
address_space migratepage callback can support this.

This is intended to be use by migrate_vma() which itself is use for thing
like HMM (see include/linux/hmm.h).

No additional per-filesystem migratepage testing is needed. I disables
MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i
added comment in those to explain why (part of this patch). The commit
message is unclear it should say that any callback that wish to support
this new mode need to be aware of the difference in the migration flow
from other mode.

Some of these callbacks do extra locking while copying (aio, zsmalloc,
balloon, ...) and for DMA to be effective you want to copy multiple
pages in one DMA operations. But in the problematic case you can not
easily hold the extra lock accross multiple call to this callback.

Usual flow is:

For each page {
1 - lock page
2 - call migratepage() callback
3 - (extra locking in some migratepage() callback)
4 - migrate page state (freeze refcount, update page cache, buffer
head, ...)
5 - copy page
6 - (unlock any extra lock of migratepage() callback)
7 - return from migratepage() callback
8 - unlock page
}

The new mode MIGRATE_SYNC_NO_COPY:
1 - lock multiple pages
For each page {
2 - call migratepage() callback
3 - abort in all problematic migratepage() callback
4 - migrate page state (freeze refcount, update page cache, buffer
head, ...)
} // finished all calls to migratepage() callback
5 - DMA copy multiple pages
6 - unlock all the pages

To support MIGRATE_SYNC_NO_COPY in the problematic case we would need a
new callback migratepages() (for instance) that deals with multiple
pages in one transaction.

Because the problematic cases are not important for current usage I did
not wanted to complexify this patchset even more for no good reason.

Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.com
Signed-off-by: Jérôme Glisse
Cc: Aneesh Kumar
Cc: Balbir Singh
Cc: Benjamin Herrenschmidt
Cc: Dan Williams
Cc: David Nellans
Cc: Evgeny Baskakov
Cc: Johannes Weiner
Cc: John Hubbard
Cc: Kirill A. Shutemov
Cc: Mark Hairgrove
Cc: Michal Hocko
Cc: Paul E. McKenney
Cc: Ross Zwisler
Cc: Sherry Cheung
Cc: Subhash Gutti
Cc: Vladimir Davydov
Cc: Bob Liu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jérôme Glisse
2017-09-09 09:26:46 +0800

07 Sep, 2017

1 commit

77ff46579 zsmalloc: zs_page_migrate: skip unnecessary loops but not return -EBUSY if zspage is not inuse ... Browse Code »

Getting -EBUSY from zs_page_migrate will make migration slow (retry) or
fail (zs_page_putback will schedule_work free_work, but it cannot ensure
the success).

I noticed this issue because my Kernel patched
(https://lkml.org/lkml/2014/5/28/113) that will remove retry in
__alloc_contig_migrate_range.

This retry will handle the -EBUSY because it will re-isolate the page
and re-call migrate_pages. Without it will make cma_alloc fail at once
with -EBUSY.

According to the review from Minchan Kim in
https://lkml.org/lkml/2014/5/28/113, I update the patch to skip
unnecessary loops but not return -EBUSY if zspage is not inuse.

Following is what I got with highalloc-performance in a vbox with 2 cpu
1G memory 512 zram as swap. And the swappiness is set to 100.

ori ne
orig new
Minor Faults 50805113 50830235
Major Faults 43918 56530
Swap Ins 42087 55680
Swap Outs 89718 104700
Allocation stalls 0 0
DMA allocs 57787 52364
DMA32 allocs 47964599 48043563
Normal allocs 0 0
Movable allocs 0 0
Direct pages scanned 45493 23167
Kswapd pages scanned 1565222 1725078
Kswapd pages reclaimed 1342222 1503037
Direct pages reclaimed 45615 25186
Kswapd efficiency 85% 87%
Kswapd velocity 1897.101 1949.042
Direct efficiency 100% 108%
Direct velocity 55.139 26.175
Percentage direct scans 2% 1%
Zone normal velocity 1952.240 1975.217
Zone dma32 velocity 0.000 0.000
Zone dma velocity 0.000 0.000
Page writes by reclaim 89764.000 105233.000
Page writes file 46 533
Page writes anon 89718 104700
Page reclaim immediate 21457 3699
Sector Reads 3259688 3441368
Sector Writes 3667252 3754836
Page rescued immediate 0 0
Slabs scanned 1042872 1160855
Direct inode steals 8042 10089
Kswapd inode steals 54295 29170
Kswapd skipped wait 0 0
THP fault alloc 175 154
THP collapse alloc 226 289
THP splits 0 0
THP fault fallback 11 14
THP collapse fail 3 2
Compaction stalls 536 646
Compaction success 322 358
Compaction failures 214 288
Page migrate success 119608 111063
Page migrate failure 2723 2593
Compaction pages isolated 250179 232652
Compaction migrate scanned 9131832 9942306
Compaction free scanned 2093272 2613998
Compaction cost 192 189
NUMA alloc hit 47124555 47193990
NUMA alloc miss 0 0
NUMA interleave hit 0 0
NUMA alloc local 47124555 47193990
NUMA base PTE updates 0 0
NUMA huge PMD updates 0 0
NUMA page range updates 0 0
NUMA hint faults 0 0
NUMA hint local faults 0 0
NUMA hint local percent 100 100
NUMA pages migrated 0 0
AutoNUMA cost 0% 0%

[akpm@linux-foundation.org: remove newline, per Minchan]
Link: http://lkml.kernel.org/r/1500889535-19648-1-git-send-email-zhuhui@xiaomi.com
Signed-off-by: Hui Zhu
Acked-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hui Zhu
2017-09-07 08:27:26 +0800

03 Aug, 2017

1 commit

3189c8205 zram: do not free pool->size_class ... Browse Code »

Mike reported kernel goes oops with ltp:zram03 testcase.

zram: Added device: zram0
zram0: detected capacity change from 0 to 107374182400
BUG: unable to handle kernel paging request at 0000306d61727a77
IP: zs_map_object+0xb9/0x260
PGD 0
P4D 0
Oops: 0000 [#1] SMP
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in: zram(E) xfs(E) libcrc32c(E) btrfs(E) xor(E) raid6_pq(E) loop(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E) x_tables(E) af_packet(E) br_netfilter(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) intel_powerclamp(E) coretemp(E) cdc_ether(E) kvm_intel(E) usbnet(E) mii(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) iTCO_wdt(E) ghash_clmulni_intel(E) bnx2(E) iTCO_vendor_support(E) pcbc(E) ioatdma(E) ipmi_ssif(E) aesni_intel(E) i5500_temp(E) i2c_i801(E) aes_x86_64(E) lpc_ich(E) shpchp(E) mfd_core(E) crypto_simd(E) i7core_edac(E) dca(E) glue_helper(E) cryptd(E) ipmi_si(E) button(E) acpi_cpufreq(E) ipmi_devintf(E) pcspkr(E) ipmi_msghandler(E)
nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) mbcache(E) jbd2(E) sd_mod(E) ata_generic(E) i2c_algo_bit(E) ata_piix(E) drm_kms_helper(E) ahci(E) syscopyarea(E) sysfillrect(E) libahci(E) sysimgblt(E) fb_sys_fops(E) uhci_hcd(E) ehci_pci(E) ttm(E) ehci_hcd(E) libata(E) drm(E) megaraid_sas(E) usbcore(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) efivarfs(E) autofs4(E) [last unloaded: zram]
CPU: 6 PID: 12356 Comm: swapon Tainted: G E 4.13.0.g87b2c3f-default #194
Hardware name: IBM System x3550 M3 -[7944K3G]-/69Y5698 , BIOS -[D6E150AUS-1.10]- 12/15/2010
task: ffff880158d2c4c0 task.stack: ffffc90001680000
RIP: 0010:zs_map_object+0xb9/0x260
Call Trace:
zram_bvec_rw.isra.26+0xe8/0x780 [zram]
zram_rw_page+0x6e/0xa0 [zram]
bdev_read_page+0x81/0xb0
do_mpage_readpage+0x51a/0x710
mpage_readpages+0x122/0x1a0
blkdev_readpages+0x1d/0x20
__do_page_cache_readahead+0x1b2/0x270
ondemand_readahead+0x180/0x2c0
page_cache_sync_readahead+0x31/0x50
generic_file_read_iter+0x7e7/0xaf0
blkdev_read_iter+0x37/0x40
__vfs_read+0xce/0x140
vfs_read+0x9e/0x150
SyS_read+0x46/0xa0
entry_SYSCALL_64_fastpath+0x1a/0xa5
Code: 81 e6 00 c0 3f 00 81 fe 00 00 16 00 0f 85 9f 01 00 00 0f b7 13 65 ff 05 5e 07 dc 7e 66 c1 ea 02 81 e2 ff 01 00 00 49 8b 54 d4 08 4a 48 41 0f af ce 81 e1 ff 0f 00 00 41 89 c9 48 c7 c3 a0 70
RIP: zs_map_object+0xb9/0x260 RSP: ffffc90001683988
CR2: 0000306d61727a77

He bisected the problem is [1].

After commit cf8e0fedf078 ("mm/zsmalloc: simplify zs_max_alloc_size
handling"), zram doesn't use double pointer for pool->size_class any
more in zs_create_pool so counter function zs_destroy_pool don't need to
free it, either.

Otherwise, it does kfree wrong address and then, kernel goes Oops.

Link: http://lkml.kernel.org/r/20170725062650.GA12134@bbox
Fixes: cf8e0fedf078 ("mm/zsmalloc: simplify zs_max_alloc_size handling")
Signed-off-by: Minchan Kim
Reported-by: Mike Galbraith
Tested-by: Mike Galbraith
Reviewed-by: Sergey Senozhatsky
Cc: Jerome Marchand
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2017-08-03 07:34:47 +0800

11 Jul, 2017

2 commits

cf8e0fedf mm/zsmalloc: simplify zs_max_alloc_size handling ... Browse Code »

Commit 40f9fb8cffc6 ("mm/zsmalloc: support allocating obj with size of
ZS_MAX_ALLOC_SIZE") fixes a size calculation error that prevented
zsmalloc to allocate an object of the maximal size (ZS_MAX_ALLOC_SIZE).
I think however the fix is unneededly complicated.

This patch replaces the dynamic calculation of zs_size_classes at init
time by a compile time calculation that uses the DIV_ROUND_UP() macro
already used in get_size_class_index().

[akpm@linux-foundation.org: use min_t]
Link: http://lkml.kernel.org/r/20170630114859.1979-1-jmarchan@redhat.com
Signed-off-by: Jerome Marchand
Acked-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: Mahendran Ganesh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jerome Marchand
2017-07-11 07:32:33 +0800
3457f4147 mm/zsmalloc.c: fix -Wunneeded-internal-declaration warning ... Browse Code »

is_first_page() is only called from the macro VM_BUG_ON_PAGE() which is
only compiled in as a runtime check when CONFIG_DEBUG_VM is set,
otherwise is checked at compile time and not actually compiled in.

Fixes the following warning, found with Clang:

mm/zsmalloc.c:472:12: warning: function 'is_first_page' is not needed and will not be emitted [-Wunneeded-internal-declaration]
static int is_first_page(struct page *page)
^

Link: http://lkml.kernel.org/r/20170524053859.29059-1-nick.desaulniers@gmail.com
Signed-off-by: Nick Desaulniers
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Desaulniers
2017-07-11 07:32:30 +0800

14 Apr, 2017

1 commit

85d492f28 zsmalloc: expand class bit ... Browse Code »

Now 64K page system, zsamlloc has 257 classes so 8 class bit is not
enough. With that, it corrupts the system when zsmalloc stores
65536byte data(ie, index number 256) so that this patch increases class
bit for simple fix for stable backport. We should clean up this mess
soon.

index size
0 32
1 288
..
..
204 52256
256 65536

Fixes: 3783689a1 ("zsmalloc: introduce zspage structure")
Link: http://lkml.kernel.org/r/1492042622-12074-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2017-04-14 09:24:21 +0800

02 Mar, 2017

1 commit

50d34394c sched/headers: Prepare to remove the <linux/magic.h> include from <linux/sched/task_stack.h> ... Browse Code »

Update files that depend on the magic.h inclusion.

Acked-by: Linus Torvalds
Cc: Mike Galbraith
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-03-02 15:42:40 +0800

25 Feb, 2017

2 commits

b538e422e mm/zsmalloc: fix comment in zsmalloc ... Browse Code »

The class index and fullness group are not encoded in
(first)page->mapping any more, after commit 3783689a1aa8 ("zsmalloc:
introduce zspage structure"). Instead, they are store in struct zspage.

Just delete this unneeded comment.

Link: http://lkml.kernel.org/r/1486620822-36826-1-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Yisheng Xie
Suggested-by: Sergey Senozhatsky
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Cc: Nitin Gupta
Cc: Hanjun Guo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yisheng Xie
2017-02-25 09:46:56 +0800
22c5cef16 mm/zsmalloc: remove redundant SetPagePrivate2 in create_page_chain ... Browse Code »

We had used page->lru to link the component pages (except the first
page) of a zspage, and used INIT_LIST_HEAD(&page->lru) to init it.
Therefore, to get the last page's next page, which is NULL, we had to
use page flag PG_Private_2 to identify it.

But now, we use page->freelist to link all of the pages in zspage and
init the page->freelist as NULL for last page, so no need to use
PG_Private_2 anymore.

This remove redundant SetPagePrivate2 in create_page_chain and
ClearPagePrivate2 in reset_page(). Save a few cycles for migration of
zsmalloc page :)

Link: http://lkml.kernel.org/r/1487076509-49270-1-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Yisheng Xie
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yisheng Xie
2017-02-25 09:46:56 +0800

23 Feb, 2017

1 commit

399d8eebe mm: fix some typos in mm/zsmalloc.c ... Browse Code »

Delete extra semicolon, and fix some typos.

Link: http://lkml.kernel.org/r/586F1823.4050107@huawei.com
Signed-off-by: Xishi Qiu
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xishi Qiu
2017-02-23 08:41:29 +0800

02 Dec, 2016

1 commit

215c89d05 mm/zsmalloc: Convert to hotplug state machine ... Browse Code »

Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Sebastian Andrzej Siewior
Cc: Sergey Senozhatsky
Cc: linux-mm@kvack.org
Cc: Minchan Kim
Cc: rt@linutronix.de
Cc: Nitin Gupta
Link: http://lkml.kernel.org/r/20161126231350.10321-11-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner

Sebastian Andrzej Siewior
2016-12-02 07:52:36 +0800

29 Jul, 2016

8 commits

c3491eca3 zsmalloc: Delete an unnecessary check before the function call "iput" ... Browse Code »

iput() tests whether its argument is NULL and then returns immediately.
Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Link: http://lkml.kernel.org/r/559cf499-4a01-25f9-c87f-24d906626a57@users.sourceforge.net
Signed-off-by: Markus Elfring
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Markus Elfring
2016-07-29 07:07:41 +0800
18fd06bf7 mm/zsmalloc: use helper to clear page->flags bit ... Browse Code »

Use ClearPagePrivate/ClearPagePrivate2 helpers to clear
PG_private/PG_private_2 in page->flags

Link: http://lkml.kernel.org/r/1467882338-4300-7-git-send-email-opensource.ganesh@gmail.com
Signed-off-by: Ganesh Mahendran
Acked-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2016-07-29 07:07:41 +0800
35b3445e9 mm/zsmalloc: add __init,__exit attribute ... Browse Code »

Add __init,__exit attribute for function that only called in module
init/exit to save memory.

Link: http://lkml.kernel.org/r/1467882338-4300-6-git-send-email-opensource.ganesh@gmail.com
Signed-off-by: Ganesh Mahendran
Cc: Sergey Senozhatsky
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2016-07-29 07:07:41 +0800
fd8544639 mm/zsmalloc: keep comments consistent with code ... Browse Code »

Some minor commebnt changes:

1). update zs_malloc(),zs_create_pool() function header
2). update "Usage of struct page fields"

Link: http://lkml.kernel.org/r/1467882338-4300-5-git-send-email-opensource.ganesh@gmail.com
Signed-off-by: Ganesh Mahendran
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2016-07-29 07:07:41 +0800
64d90465f mm/zsmalloc: avoid calculate max objects of zspage twice ... Browse Code »

Currently, if a class can not be merged, the max objects of zspage in
that class may be calculated twice.

This patch calculate max objects of zspage at the begin, and pass the
value to can_merge() to decide whether the class can be merged.

Also this patch remove function get_maxobj_per_zspage(), as there is no
other place to call this function.

Link: http://lkml.kernel.org/r/1467882338-4300-4-git-send-email-opensource.ganesh@gmail.com
Signed-off-by: Ganesh Mahendran
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2016-07-29 07:07:41 +0800
b4fd07a08 mm/zsmalloc: use class->objs_per_zspage to get num of max objects ... Browse Code »

num of max objects in zspage is stored in each size_class now. So there
is no need to re-calculate it.

Link: http://lkml.kernel.org/r/1467882338-4300-3-git-send-email-opensource.ganesh@gmail.com
Signed-off-by: Ganesh Mahendran
Acked-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2016-07-29 07:07:41 +0800
cf675acb7 mm/zsmalloc: take obj index back from find_alloced_obj ... Browse Code »

the obj index value should be updated after return from
find_alloced_obj() to avoid CPU burning caused by unnecessary object
scanning.

Link: http://lkml.kernel.org/r/1467882338-4300-2-git-send-email-opensource.ganesh@gmail.com
Signed-off-by: Ganesh Mahendran
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2016-07-29 07:07:41 +0800
41b88e14c mm/zsmalloc: use obj_index to keep consistent with others ... Browse Code »

This is a cleanup patch. Change "index" to "obj_index" to keep
consistent with others in zsmalloc.

Link: http://lkml.kernel.org/r/1467882338-4300-1-git-send-email-opensource.ganesh@gmail.com
Signed-off-by: Ganesh Mahendran
Reviewed-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ganesh Mahendran
2016-07-29 07:07:41 +0800

27 Jul, 2016

11 commits

dd4123f32 mm: fix build warnings in <linux/compaction.h> ... Browse Code »

Randy reported below build error.

> In file included from ../include/linux/balloon_compaction.h:48:0,
> from ../mm/balloon_compaction.c:11:
> ../include/linux/compaction.h:237:51: warning: 'struct node' declared inside parameter list [enabled by default]
> static inline int compaction_register_node(struct node *node)
> ../include/linux/compaction.h:237:51: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
> ../include/linux/compaction.h:242:54: warning: 'struct node' declared inside parameter list [enabled by default]
> static inline void compaction_unregister_node(struct node *node)
>

It was caused by non-lru page migration which needs compaction.h but
compaction.h doesn't include any header to be standalone.

I think proper header for non-lru page migration is migrate.h rather
than compaction.h because migrate.h has already headers needed to work
non-lru page migration indirectly like isolate_mode_t, migrate_mode
MIGRATEPAGE_SUCCESS.

[akpm@linux-foundation.org: revert mm-balloon-use-general-non-lru-movable-page-feature-fix.patch temp fix]
Link: http://lkml.kernel.org/r/20160610003304.GE29779@bbox
Signed-off-by: Minchan Kim
Reported-by: Randy Dunlap
Cc: Konstantin Khlebnikov
Cc: Vlastimil Babka
Cc: Gioh Kim
Cc: Rafael Aquini
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
91537fee0 mm: add NR_ZSMALLOC to vmstat ... Browse Code »

zram is very popular for some of the embedded world (e.g., TV, mobile
phones). On those system, zsmalloc's consumed memory size is never
trivial (one of example from real product system, total memory: 800M,
zsmalloc consumed: 150M), so we have used this out of tree patch to
monitor system memory behavior via /proc/vmstat.

With zsmalloc in vmstat, it helps in tracking down system behavior due
to memory usage.

[minchan@kernel.org: zsmalloc: follow up zsmalloc vmstat]
Link: http://lkml.kernel.org/r/20160607091737.GC23435@bbox
[akpm@linux-foundation.org: fix build with CONFIG_ZSMALLOC=m]
Link: http://lkml.kernel.org/r/1464919731-13255-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Sangseok Lee
Cc: Chanho Min
Cc: Chan Gyun Jeong
Cc: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
3b1d9ca65 zsmalloc: use OBJ_TAG_BIT for bit shifter ... Browse Code »

Static check warns using tag as bit shifter. It doesn't break current
working but not good for redability. Let's use OBJ_TAG_BIT as bit
shifter instead of OBJ_ALLOCATED_TAG.

Link: http://lkml.kernel.org/r/20160607045146.GF26230@bbox
Signed-off-by: Minchan Kim
Reported-by: Dan Carpenter
Cc: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
48b4800a1 zsmalloc: page migration support ... Browse Code »

This patch introduces run-time migration feature for zspage.

For migration, VM uses page.lru field so it would be better to not use
page.next field which is unified with page.lru for own purpose. For
that, firstly, we can get first object offset of the page via runtime
calculation instead of using page.index so we can use page.index as link
for page chaining instead of page.next.

In case of huge object, it stores handle to page.index instead of next
link of page chaining because huge object doesn't need to next link for
page chaining. So get_next_page need to identify huge object to return
NULL. For it, this patch uses PG_owner_priv_1 flag of the page flag.

For migration, it supports three functions

* zs_page_isolate

It isolates a zspage which includes a subpage VM want to migrate from
class so anyone cannot allocate new object from the zspage.

We could try to isolate a zspage by the number of subpage so subsequent
isolation trial of other subpage of the zpsage shouldn't fail. For
that, we introduce zspage.isolated count. With that, zs_page_isolate
can know whether zspage is already isolated or not for migration so if
it is isolated for migration, subsequent isolation trial can be
successful without trying further isolation.

* zs_page_migrate

First of all, it holds write-side zspage->lock to prevent migrate other
subpage in zspage. Then, lock all objects in the page VM want to
migrate. The reason we should lock all objects in the page is due to
race between zs_map_object and zs_page_migrate.

zs_map_object zs_page_migrate

pin_tag(handle)
obj = handle_to_obj(handle)
obj_to_location(obj, &page, &obj_idx);

write_lock(&zspage->lock)
if (!trypin_tag(handle))
goto unpin_object

zspage = get_zspage(page);
read_lock(&zspage->lock);

If zs_page_migrate doesn't do trypin_tag, zs_map_object's page can be
stale by migration so it goes crash.

If it locks all of objects successfully, it copies content from old page
to new one, finally, create new zspage chain with new page. And if it's
last isolated subpage in the zspage, put the zspage back to class.

* zs_page_putback

It returns isolated zspage to right fullness_group list if it fails to
migrate a page. If it find a zspage is ZS_EMPTY, it queues zspage
freeing to workqueue. See below about async zspage freeing.

This patch introduces asynchronous zspage free. The reason to need it
is we need page_lock to clear PG_movable but unfortunately, zs_free path
should be atomic so the apporach is try to grab page_lock. If it got
page_lock of all of pages successfully, it can free zspage immediately.
Otherwise, it queues free request and free zspage via workqueue in
process context.

If zs_free finds the zspage is isolated when it try to free zspage, it
delays the freeing until zs_page_putback finds it so it will free free
the zspage finally.

In this patch, we expand fullness_list from ZS_EMPTY to ZS_FULL. First
of all, it will use ZS_EMPTY list for delay freeing. And with adding
ZS_FULL list, it makes to identify whether zspage is isolated or not via
list_empty(&zspage->list) test.

[minchan@kernel.org: zsmalloc: keep first object offset in struct page]
Link: http://lkml.kernel.org/r/1465788015-23195-1-git-send-email-minchan@kernel.org
[minchan@kernel.org: zsmalloc: zspage sanity check]
Link: http://lkml.kernel.org/r/20160603010129.GC3304@bbox
Link: http://lkml.kernel.org/r/1464736881-24886-12-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
bfd093f5e zsmalloc: use freeobj for index ... Browse Code »

Zsmalloc stores first free object's position into freeobj
in each zspage. If we change it with index from first_page instead of
position, it makes page migration simple because we don't need to
correct other entries for linked list if a page is migrated out.

Link: http://lkml.kernel.org/r/1464736881-24886-11-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
4aa409cab zsmalloc: separate free_zspage from putback_zspage ... Browse Code »

Currently, putback_zspage does free zspage under class->lock if fullness
become ZS_EMPTY but it makes trouble to implement locking scheme for new
zspage migration. So, this patch is to separate free_zspage from
putback_zspage and free zspage out of class->lock which is preparation
for zspage migration.

Link: http://lkml.kernel.org/r/1464736881-24886-10-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
3783689a1 zsmalloc: introduce zspage structure ... Browse Code »

We have squeezed meta data of zspage into first page's descriptor. So,
to get meta data from subpage, we should get first page first of all.
But it makes trouble to implment page migration feature of zsmalloc
because any place where to get first page from subpage can be raced with
first page migration. IOW, first page it got could be stale. For
preventing it, I have tried several approahces but it made code
complicated so finally, I concluded to separate metadata from first
page. Of course, it consumes more memory. IOW, 16bytes per zspage on
32bit at the moment. It means we lost 1% at *worst case*(40B/4096B)
which is not bad I think at the cost of maintenance.

Link: http://lkml.kernel.org/r/1464736881-24886-9-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
bdb0af7ca zsmalloc: factor page chain functionality out ... Browse Code »

For page migration, we need to create page chain of zspage dynamically
so this patch factors it out from alloc_zspage.

Link: http://lkml.kernel.org/r/1464736881-24886-8-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
4f42047bb zsmalloc: use accessor ... Browse Code »

Upcoming patch will change how to encode zspage meta so for easy review,
this patch wraps code to access metadata as accessor.

Link: http://lkml.kernel.org/r/1464736881-24886-7-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
1b8320b62 zsmalloc: use bit_spin_lock ... Browse Code »

Use kernel standard bit spin-lock instead of custom mess. Even, it has
a bug which doesn't disable preemption. The reason we don't have any
problem is that we have used it during preemption disable section by
class->lock spinlock. So no need to go to stable.

Link: http://lkml.kernel.org/r/1464736881-24886-6-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800
1fc6e27d7 zsmalloc: keep max_object in size_class ... Browse Code »

Every zspage in a size_class has same number of max objects so we could
move it to a size_class.

Link: http://lkml.kernel.org/r/1464736881-24886-5-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-07-27 07:19:19 +0800

27 May, 2016

1 commit

4abaac9b7 update "mm/zsmalloc: don't fail if can't create debugfs info" ... Browse Code »

Some updates to commit d34f615720d1 ("mm/zsmalloc: don't fail if can't
create debugfs info"):

- add pr_warn to all stat failure cases
- do not prevent module loading on stat failure

Link: http://lkml.kernel.org/r/1463671123-5479-1-git-send-email-ddstreet@ieee.org
Signed-off-by: Dan Streetman
Reviewed-by: Ganesh Mahendran
Acked-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: Dan Streetman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Streetman
2016-05-27 06:35:44 +0800

21 May, 2016

6 commits

d34f61572 mm/zsmalloc: don't fail if can't create debugfs info ... Browse Code »

Change the return type of zs_pool_stat_create() to void, and remove the
logic to abort pool creation if the stat debugfs dir/file could not be
created.

The debugfs stat file is for debugging/information only, and doesn't
affect operation of zsmalloc; there is no reason to abort creating the
pool if the stat file can't be created. This was seen with zswap, which
used the same name for all pool creations, which caused zsmalloc to fail
to create a second pool for zswap if CONFIG_ZSMALLOC_STAT was enabled.

Signed-off-by: Dan Streetman
Reviewed-by: Sergey Senozhatsky
Cc: Dan Streetman
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Streetman
2016-05-21 08:58:30 +0800
d0d8da2dc zsmalloc: require GFP in zs_malloc() ... Browse Code »

Pass GFP flags to zs_malloc() instead of using a fixed mask supplied to
zs_create_pool(), so we can be more flexible, but, more importantly, we
need this to switch zram to per-cpu compression streams -- zram will try
to allocate handle with preemption disabled in a fast path and switch to
a slow path (using different gfp mask) if the fast one has failed.

Apart from that, this also align zs_malloc() interface with zspool/zbud.

[sergey.senozhatsky@gmail.com: pass GFP flags to zs_malloc() instead of using a fixed mask]
Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
Signed-off-by: Sergey Senozhatsky
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2016-05-21 08:58:30 +0800
1ee471658 zsmalloc: remove unused pool param in obj_free ... Browse Code »

Let's remove unused pool param in obj_free

Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-05-21 08:58:30 +0800
251cbb951 zsmalloc: reorder function parameters ... Browse Code »

Clean up function parameter ordering to order higher data structure
first.

Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-05-21 08:58:30 +0800
830e4bc5b zsmalloc: clean up many BUG_ON ... Browse Code »

There are many BUG_ON in zsmalloc.c which is not recommened so change
them as alternatives.

Normal rule is as follows:

1. avoid BUG_ON if possible. Instead, use VM_BUG_ON or VM_BUG_ON_PAGE

2. use VM_BUG_ON_PAGE if we need to see struct page's fields

3. use those assertion in primitive functions so higher functions can
rely on the assertion in the primitive function.

4. Don't use assertion if following instruction can trigger Oops

Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-05-21 08:58:30 +0800
a42094676 zsmalloc: use first_page rather than page ... Browse Code »

Clean up function parameter "struct page". Many functions of zsmalloc
expect that page paramter is "first_page" so use "first_page" rather
than "page" for code readability.

Signed-off-by: Minchan Kim
Reviewed-by: Sergey Senozhatsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2016-05-21 08:58:30 +0800

10 May, 2016

1 commit

44f43e99f zsmalloc: fix zs_can_compact() integer overflow ... Browse Code »

zs_can_compact() has two race conditions in its core calculation:

unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
zs_stat_get(class, OBJ_USED);

1) classes are not locked, so the numbers of allocated and used
objects can change by the concurrent ops happening on other CPUs
2) shrinker invokes it from preemptible context

Depending on the circumstances, thus, OBJ_ALLOCATED can become
less than OBJ_USED, which can result in either very high or
negative `total_scan' value calculated later in do_shrink_slab().

do_shrink_slab() has some logic to prevent those cases:

vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62

However, due to the way `total_scan' is calculated, not every
shrinker->count_objects() overflow can be spotted and handled.
To demonstrate the latter, I added some debugging code to do_shrink_slab()
(x86_64) and the results were:

vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
vmscan: but total_scan > 0: 92679974445502
vmscan: resulting total_scan: 92679974445502
[..]
vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
vmscan: but total_scan > 0: 22634041808232578
vmscan: resulting total_scan: 22634041808232578

Even though shrinker->count_objects() has returned an overflowed value,
the resulting `total_scan' is positive, and, what is more worrisome, it
is insanely huge. This value is getting used later on in
shrinker->scan_objects() loop:

while (total_scan >= batch_size ||
total_scan >= freeable) {
unsigned long ret;
unsigned long nr_to_scan = min(batch_size, total_scan);

shrinkctl->nr_to_scan = nr_to_scan;
ret = shrinker->scan_objects(shrinker, shrinkctl);
if (ret == SHRINK_STOP)
break;
freed += ret;

count_vm_events(SLABS_SCANNED, nr_to_scan);
total_scan -= nr_to_scan;

cond_resched();
}

`total_scan >= batch_size' is true for a very-very long time and
'total_scan >= freeable' is also true for quite some time, because
`freeable < 0' and `total_scan' is large enough, for example,
22634041808232578. The only break condition, in the given scheme of
things, is shrinker->scan_objects() == SHRINK_STOP test, which is a
bit too weak to rely on, especially in heavy zsmalloc-usage scenarios.

To fix the issue, take a pool stat snapshot and use it instead of
racy zs_stat_get() calls.

Link: http://lkml.kernel.org/r/20160509140052.3389-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky
Cc: Minchan Kim
Cc: [4.3+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sergey Senozhatsky
2016-05-10 08:40:59 +0800