Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

08 Nov, 2014

1 commit

661b99e95 Merge tag 'xfs-for-linus-3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs ... Browse Code »

Pull xfs fixes from Dave Chinner:
"This update fixes a warning in the new pagecache_isize_extended() and
updates some related comments, another fix for zero-range
misbehaviour, and an unforntuately large set of fixes for regressions
in the bulkstat code.

The bulkstat fixes are large but necessary. I wouldn't normally push
such a rework for a -rcX update, but right now xfsdump can silently
create incomplete dumps on 3.17 and it's possible that even xfsrestore
won't notice that the dumps were incomplete. Hence we need to get
this update into 3.17-stable kernels ASAP.

In more detail, the refactoring work I committed in 3.17 has exposed a
major hole in our QA coverage. With both xfsdump (the major user of
bulkstat) and xfsrestore silently ignoring missing files in the
dump/restore process, incomplete dumps were going unnoticed if they
were being triggered. Many of the dump/restore filesets were so small
that they didn't evenhave a chance of triggering the loop iteration
bugs we introduced in 3.17, so we didn't exercise the code
sufficiently, either.

We have already taken steps to improve QA coverage in xfstests to
avoid this happening again, and I've done a lot of manual verification
of dump/restore on very large data sets (tens of millions of inodes)
of the past week to verify this patch set results in bulkstat behaving
the same way as it does on 3.16.

Unfortunately, the fixes are not exactly simple - in tracking down the
problem historic API warts were discovered (e.g xfsdump has been
working around a 20 year old bug in the bulkstat API for the past 10
years) and so that complicated the process of diagnosing and fixing
the problems. i.e. we had to fix bugs in the code as well as
discover and re-introduce the userspace visible API bugs that we
unwittingly "fixed" in 3.17 that xfsdump relied on to work correctly.

Summary:

- incorrect warnings about i_mutex locking in pagecache_isize_extended()
and updates comments to match expected locking
- another zero-range bug fix for stray file size updates
- a bunch of fixes for regression in the bulkstat code introduced in
3.17"

* tag 'xfs-for-linus-3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: track bulkstat progress by agino
xfs: bulkstat error handling is broken
xfs: bulkstat main loop logic is a mess
xfs: bulkstat chunk-formatter has issues
xfs: bulkstat chunk formatting cursor is broken
xfs: bulkstat btree walk doesn't terminate
mm: Fix comment before truncate_setsize()
xfs: rework zero range to prevent invalid i_size updates
mm: Remove false WARN_ON from pagecache_isize_extended()
xfs: Check error during inode btree iteration in xfs_bulkstat()
xfs: bulkstat doesn't release AGI buffer on error

Linus Torvalds
2014-11-08 06:08:13 +0800

07 Nov, 2014

1 commit

77783d064 mm: Fix comment before truncate_setsize() ... Browse Code »

XFS doesn't always hold i_mutex when calling truncate_setsize() and it
uses a different lock to serialize truncates and writes. So fix the
comment before truncate_setsize().

Reported-by: Jan Beulich
Signed-off-by: Jan Kara
Signed-off-by: Dave Chinner

Jan Kara
2014-11-07 05:29:25 +0800

04 Nov, 2014

1 commit

f3ed88a6b Merge branch 'fixes-for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping ... Browse Code »

Pull CMA and DMA-mapping fixes from Marek Szyprowski:
"This contains important fixes for recently introduced highmem support
for default contiguous memory region used for dma-mapping subsystem"

* 'fixes-for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
mm, cma: make parameters order consistent in func declaration and definition
mm: cma: Use %pa to print physical addresses
mm: cma: Ensure that reservations never cross the low/high mem boundary
mm: cma: Always consider a 0 base address reservation as dynamic
mm: cma: Don't crash on allocation if CMA area can't be activated

Linus Torvalds
2014-11-04 13:01:04 +0800

30 Oct, 2014

11 commits

f55fefd1a mm: Remove false WARN_ON from pagecache_isize_extended() ... Browse Code »
5

The WARN_ON checking whether i_mutex is held in
pagecache_isize_extended() was wrong because some filesystems (e.g.
XFS) use different locks for serialization of truncates / writes. So
just remove the check.

Signed-off-by: Jan Kara
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Jan Kara
2014-10-30 07:35:00 +0800
4d88e6f7d mm/balloon_compaction: fix deflation when compaction is disabled ... Browse Code »

If CONFIG_BALLOON_COMPACTION=n balloon_page_insert() does not link pages
with balloon and doesn't set PagePrivate flag, as a result
balloon_page_dequeue() cannot get any pages because it thinks that all
of them are isolated. Without balloon compaction nobody can isolate
ballooned pages. It's safe to remove this check.

Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management").
Signed-off-by: Konstantin Khlebnikov
Reported-by: Matt Mullins
Cc: [3.17]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2014-10-30 07:33:15 +0800
8aba7e0a2 mm/slab_common: don't check for duplicate cache names ... Browse Code »

The SLUB cache merges caches with the same size and alignment and there
was long standing bug with this behavior:

- create the cache named "foo"
- create the cache named "bar" (which is merged with "foo")
- delete the cache named "foo" (but it stays allocated because "bar"
uses it)
- create the cache named "foo" again - it fails because the name "foo"
is already used

That bug was fixed in commit 694617474e33 ("slab_common: fix the check
for duplicate slab names") by not warning on duplicate cache names when
the SLUB subsystem is used.

Recently, cache merging was implemented the with SLAB subsystem too, in
12220dea07f1 ("mm/slab: support slab merge")). Therefore we need stop
checking for duplicate names even for the SLAB subsystem.

This patch fixes the bug by removing the check.

Signed-off-by: Mikulas Patocka
Acked-by: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mikulas Patocka
2014-10-30 07:33:15 +0800
8186eb6a7 mm: rmap: split out page_remove_file_rmap() ... Browse Code »

page_remove_rmap() has too many branches on PageAnon() and is hard to
follow. Move the file part into a separate function.

Signed-off-by: Johannes Weiner
Reviewed-by: Michal Hocko
Cc: Vladimir Davydov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2014-10-30 07:33:15 +0800
d7365e783 mm: memcontrol: fix missed end-writeback page accounting ... Browse Code »
13

Commit 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") changed
page migration to uncharge the old page right away. The page is locked,
unmapped, truncated, and off the LRU, but it could race with writeback
ending, which then doesn't unaccount the page properly:

test_clear_page_writeback() migration
wait_on_page_writeback()
TestClearPageWriteback()
mem_cgroup_migrate()
clear PCG_USED
mem_cgroup_update_page_stat()
if (PageCgroupUsed(pc))
decrease memcg pages under writeback

release pc->mem_cgroup->move_lock

The per-page statistics interface is heavily optimized to avoid a
function call and a lookup_page_cgroup() in the file unmap fast path,
which means it doesn't verify whether a page is still charged before
clearing PageWriteback() and it has to do it in the stat update later.

Rework it so that it looks up the page's memcg once at the beginning of
the transaction and then uses it throughout. The charge will be
verified before clearing PageWriteback() and migration can't uncharge
the page as long as that is still set. The RCU lock will protect the
memcg past uncharge.

As far as losing the optimization goes, the following test results are
from a microbenchmark that maps, faults, and unmaps a 4GB sparse file
three times in a nested fashion, so that there are two negative passes
that don't account but still go through the new transaction overhead.
There is no actual difference:

old: 33.195102545 seconds time elapsed ( +- 0.01% )
new: 33.199231369 seconds time elapsed ( +- 0.03% )

The time spent in page_remove_rmap()'s callees still adds up to the
same, but the time spent in the function itself seems reduced:

# Children Self Command Shared Object Symbol
old: 0.12% 0.11% filemapstress [kernel.kallsyms] [k] page_remove_rmap
new: 0.12% 0.08% filemapstress [kernel.kallsyms] [k] page_remove_rmap

Signed-off-by: Johannes Weiner
Acked-by: Michal Hocko
Cc: Vladimir Davydov
Cc: [3.17.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2014-10-30 07:33:15 +0800
3a3c02ecf mm: page-writeback: inline account_page_dirtied() into single caller ... Browse Code »

A follow-up patch would have changed the call signature. To save the
trouble, just fold it instead.

Signed-off-by: Johannes Weiner
Acked-by: Michal Hocko
Cc: Vladimir Davydov
Cc: [3.17.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2014-10-30 07:33:14 +0800
35dca71c1 memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node() ... Browse Code »

When hot adding the same memory after hot removal, the following
messages are shown:

WARNING: CPU: 20 PID: 6 at mm/page_alloc.c:4968 free_area_init_node+0x3fe/0x426()
...
Call Trace:
dump_stack+0x46/0x58
warn_slowpath_common+0x81/0xa0
warn_slowpath_null+0x1a/0x20
free_area_init_node+0x3fe/0x426
hotadd_new_pgdat+0x90/0x110
add_memory+0xd4/0x200
acpi_memory_device_add+0x1aa/0x289
acpi_bus_attach+0xfd/0x204
acpi_bus_attach+0x178/0x204
acpi_bus_scan+0x6a/0x90
acpi_device_hotplug+0xe8/0x418
acpi_hotplug_work_fn+0x1f/0x2b
process_one_work+0x14e/0x3f0
worker_thread+0x11b/0x510
kthread+0xe1/0x100
ret_from_fork+0x7c/0xb0

The detaled explanation is as follows:

When hot removing memory, pgdat is set to 0 in try_offline_node(). But
if the pgdat is allocated by bootmem allocator, the clearing step is
skipped.

And when hot adding the same memory, the uninitialized pgdat is reused.
But free_area_init_node() checks wether pgdat is set to zero. As a
result, free_area_init_node() hits WARN_ON().

This patch clears pgdat which is allocated by bootmem allocator in
try_offline_node().

Signed-off-by: Yasuaki Ishimatsu
Cc: Zhang Zhen
Cc: Wang Nan
Cc: Tang Chen
Reviewed-by: Toshi Kani
Cc: Dave Hansen
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yasuaki Ishimatsu
2014-10-30 07:33:14 +0800
6d50e60cd mm, thp: fix collapsing of hugepages on madvise ... Browse Code »

If an anonymous mapping is not allowed to fault thp memory and then
madvise(MADV_HUGEPAGE) is used after fault, khugepaged will never
collapse this memory into thp memory.

This occurs because the madvise(2) handler for thp, hugepage_madvise(),
clears VM_NOHUGEPAGE on the stack and it isn't stored in vma->vm_flags
until the final action of madvise_behavior(). This causes the
khugepaged_enter_vma_merge() to be a no-op in hugepage_madvise() when
the vma had previously had VM_NOHUGEPAGE set.

Fix this by passing the correct vma flags to the khugepaged mm slot
handler. There's no chance khugepaged can run on this vma until after
madvise_behavior() returns since we hold mm->mmap_sem.

It would be possible to clear VM_NOHUGEPAGE directly from vma->vm_flags
in hugepage_advise(), but I didn't want to introduce special case
behavior into madvise_behavior(). I think it's best to just let it
always set vma->vm_flags itself.

Signed-off-by: David Rientjes
Reported-by: Suleiman Souhlal
Cc: "Kirill A. Shutemov"
Cc: Andrea Arcangeli
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2014-10-30 07:33:14 +0800
5ddacbe92 mm: free compound page with correct order ... Browse Code »
5

Compound page should be freed by put_page() or free_pages() with correct
order. Not doing so will cause tail pages leaked.

The compound order can be obtained by compound_order() or use
HPAGE_PMD_ORDER in our case. Some people would argue the latter is
faster but I prefer the former which is more general.

This bug was observed not just on our servers (the worst case we saw is
11G leaked on a 48G machine) but also on our workstations running Ubuntu
based distro.

$ cat /proc/vmstat | grep thp_zero_page_alloc
thp_zero_page_alloc 55
thp_zero_page_alloc_failed 0

This means there is (thp_zero_page_alloc - 1) * (2M - 4K) memory leaked.

Fixes: 97ae17497e99 ("thp: implement refcounting for huge zero page")
Signed-off-by: Yu Zhao
Acked-by: Kirill A. Shutemov
Cc: Andrea Arcangeli
Cc: Mel Gorman
Cc: David Rientjes
Cc: Bob Liu
Cc: [3.8+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yu Zhao
2014-10-30 07:33:14 +0800
6ea41c0c0 mm/compaction.c: avoid premature range skip in isolate_migratepages_range ... Browse Code »

Commit edc2ca612496 ("mm, compaction: move pageblock checks up from
isolate_migratepages_range()") commonizes isolate_migratepages variants
and make them use isolate_migratepages_block().

isolate_migratepages_block() could stop the execution when enough pages
are isolated, but, there is no code in isolate_migratepages_range() to
handle this case. In the result, even if isolate_migratepages_block()
returns prematurely without checking all pages in the range,

isolate_migratepages_block() is called repeately on the following
pageblock and some pages in the previous range are skipped to check.
Then, CMA is failed frequently due to this fact.

To fix this problem, this patch let isolate_migratepages_range() know
the situation that enough pages are isolated and stop the isolation in
that case.

Note that isolate_migratepages() has no such problem, because, it always
stops the isolation after just one call of isolate_migratepages_block().

Signed-off-by: Joonsoo Kim
Acked-by: Vlastimil Babka
Cc: David Rientjes
Cc: Minchan Kim
Cc: Michal Nazarewicz
Cc: Naoya Horiguchi
Cc: Christoph Lameter
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Zhang Yanfei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2014-10-30 07:33:13 +0800
401507d67 cgroup/kmemleak: add kmemleak_free() for cgroup deallocations. ... Browse Code »
5

Commit ff7ee93f4715 ("cgroup/kmemleak: Annotate alloc_page() for cgroup
allocations") introduces kmemleak_alloc() for alloc_page_cgroup(), but
corresponding kmemleak_free() is missing, which makes kmemleak be
wrongly disabled after memory offlining. Log is pasted at the end of
this commit message.

This patch add kmemleak_free() into free_page_cgroup(). During page
offlining, this patch removes corresponding entries in kmemleak rbtree.
After that, the freed memory can be allocated again by other subsystems
without killing kmemleak.

bash # for x in 1 2 3 4; do echo offline > /sys/devices/system/memory/memory$x/state ; sleep 1; done ; dmesg | grep leak

Offlined Pages 32768
kmemleak: Cannot insert 0xffff880016969000 into the object search tree (overlaps existing)
CPU: 0 PID: 412 Comm: sleep Not tainted 3.17.0-rc5+ #86
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
dump_stack+0x46/0x58
create_object+0x266/0x2c0
kmemleak_alloc+0x26/0x50
kmem_cache_alloc+0xd3/0x160
__sigqueue_alloc+0x49/0xd0
__send_signal+0xcb/0x410
send_signal+0x45/0x90
__group_send_sig_info+0x13/0x20
do_notify_parent+0x1bb/0x260
do_exit+0x767/0xa40
do_group_exit+0x44/0xa0
SyS_exit_group+0x17/0x20
system_call_fastpath+0x16/0x1b

kmemleak: Kernel memory leak detector disabled
kmemleak: Object 0xffff880016900000 (size 524288):
kmemleak: comm "swapper/0", pid 0, jiffies 4294667296
kmemleak: min_count = 0
kmemleak: count = 0
kmemleak: flags = 0x1
kmemleak: checksum = 0
kmemleak: backtrace:
log_early+0x63/0x77
kmemleak_alloc+0x4b/0x50
init_section_page_cgroup+0x7f/0xf5
page_cgroup_init+0xc5/0xd0
start_kernel+0x333/0x408
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0xf5/0xfc

Fixes: ff7ee93f4715 (cgroup/kmemleak: Annotate alloc_page() for cgroup allocations)
Signed-off-by: Wang Nan
Acked-by: Johannes Weiner
Acked-by: Michal Hocko
Cc: Steven Rostedt
Cc: [3.2+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wang Nan
2014-10-30 07:33:13 +0800

29 Oct, 2014

1 commit

ce9ec37bd zap_pte_range: update addr when forcing flush after TLB batching faiure ... Browse Code »

When unmapping a range of pages in zap_pte_range, the page being
unmapped is added to an mmu_gather_batch structure for asynchronous
freeing. If we run out of space in the batch structure before the range
has been completely unmapped, then we break out of the loop, force a
TLB flush and free the pages that we have batched so far. If there are
further pages to unmap, then we resume the loop where we left off.

Unfortunately, we forget to update addr when we break out of the loop,
which causes us to truncate the range being invalidated as the end
address is exclusive. When we re-enter the loop at the same address, the
page has already been freed and the pte_present test will fail, meaning
that we do not reconsider the address for invalidation.

This patch fixes the problem by incrementing addr by the PAGE_SIZE
before breaking out of the loop on batch failure.

Signed-off-by: Will Deacon
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds

Will Deacon
2014-10-29 04:16:28 +0800

27 Oct, 2014

5 commits

56fa4f609 mm: cma: Use %pa to print physical addresses ... Browse Code »

Casting physical addresses to unsigned long and using %lu truncates the
values on systems where physical addresses are larger than 32 bits. Use
%pa and get rid of the cast instead.

Signed-off-by: Laurent Pinchart
Acked-by: Michal Nazarewicz
Acked-by: Geert Uytterhoeven
Signed-off-by: Marek Szyprowski

Laurent Pinchart
2014-10-27 20:00:55 +0800
16195ddd4 mm: cma: Ensure that reservations never cross the low/high mem boundary ... Browse Code »

Commit 95b0e655f914 ("ARM: mm: don't limit default CMA region only to
low memory") extended CMA memory reservation to allow usage of high
memory. It relied on commit f7426b983a6a ("mm: cma: adjust address limit
to avoid hitting low/high memory boundary") to ensure that the reserved
block never crossed the low/high memory boundary. While the
implementation correctly lowered the limit, it failed to consider the
case where the base..limit range crossed the low/high memory boundary
with enough space on each side to reserve the requested size on either
low or high memory.

Rework the base and limit adjustment to fix the problem. The function
now starts by rejecting the reservation altogether for fixed
reservations that cross the boundary, tries to reserve from high memory
first and then falls back to low memory.

Signed-off-by: Laurent Pinchart
Signed-off-by: Marek Szyprowski

Laurent Pinchart
2014-10-27 20:00:54 +0800
800a85d3d mm: cma: Always consider a 0 base address reservation as dynamic ... Browse Code »

The fixed parameter to cma_declare_contiguous() tells the function
whether the given base address must be honoured or should be considered
as a hint only. The API considers a zero base address as meaning any
base address, which must never be considered as a fixed value.

Part of the implementation correctly checks both fixed and base != 0,
but two locations check the fixed value only. Set fixed to false when
base is 0 to fix that and simplify the code.

Signed-off-by: Laurent Pinchart
Acked-by: Michal Nazarewicz
Signed-off-by: Marek Szyprowski

Laurent Pinchart
2014-10-27 20:00:54 +0800
f022d8cb7 mm: cma: Don't crash on allocation if CMA area can't be activated ... Browse Code »

If activation of the CMA area fails its mutex won't be initialized,
leading to an oops at allocation time when trying to lock the mutex. Fix
this by setting the cma area count field to 0 when activation fails,
leading to allocation returning NULL immediately.

Cc: # v3.17
Signed-off-by: Laurent Pinchart
Acked-by: Michal Nazarewicz
Signed-off-by: Marek Szyprowski

Laurent Pinchart
2014-10-27 20:00:54 +0800
d1e14f1d6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"overlayfs merge + leak fix for d_splice_alias() failure exits"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
overlayfs: embed middle into overlay_readdir_data
overlayfs: embed root into overlay_readdir_data
overlayfs: make ovl_cache_entry->name an array instead of pointer
overlayfs: don't hold ->i_mutex over opening the real directory
fix inode leaks on d_splice_alias() failure exits
fs: limit filesystem stacking depth
overlay: overlay filesystem documentation
overlayfs: implement show_options
overlayfs: add statfs support
overlay filesystem
shmem: support RENAME_WHITEOUT
ext4: support RENAME_WHITEOUT
vfs: add RENAME_WHITEOUT
vfs: add whiteout support
vfs: export check_sticky()
vfs: introduce clone_private_mount()
vfs: export __inode_permission() to modules
vfs: export do_splice_direct() to modules
vfs: add i_op->dentry_open()

Linus Torvalds
2014-10-27 02:19:18 +0800

25 Oct, 2014

1 commit

1c45d9a92 Merge tag 'pm+acpi-3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull ACPI and power management updates from Rafael Wysocki:
"This is material that didn't make it to my 3.18-rc1 pull request for
various reasons, mostly related to timing and travel (LinuxCon EU /
LPC) plus a couple of fixes for recent bugs.

The only really new thing here is the PM QoS class for memory
bandwidth, but it is simple enough and users of it will be added in
the next cycle. One major change in behavior is that platform devices
enumerated by ACPI will use 32-bit DMA mask by default. Also included
is an ACPICA update to a new upstream release, but that's mostly
cleanups, changes in tools and similar. The rest is fixes and
cleanups mostly.

Specifics:

- Fix for a recent PCI power management change that overlooked the
fact that some IRQ chips might not be able to configure PCIe PME
for system wakeup from Lucas Stach.

- Fix for a bug introduced in 3.17 where acpi_device_wakeup() is
called with a wrong ordering of arguments from Zhang Rui.

- A bunch of intel_pstate driver fixes (all -stable candidates) from
Dirk Brandewie, Gabriele Mazzotta and Pali Rohár.

- Fixes for a rather long-standing problem with the OOM killer and
the freezer that frozen processes killed by the OOM do not actually
release any memory until they are thawed, so OOM-killing them is
rather pointless, with a couple of cleanups on top (Michal Hocko,
Cong Wang, Rafael J Wysocki).

- ACPICA update to upstream release 20140926, inlcuding mostly
cleanups reducing differences between the upstream ACPICA and the
kernel code, tools changes (acpidump, acpiexec) and support for the
_DDN object (Bob Moore, Lv Zheng).

- New PM QoS class for memory bandwidth from Tomeu Vizoso.

- Default 32-bit DMA mask for platform devices enumerated by ACPI
(this change is mostly needed for some drivers development in
progress targeted at 3.19) from Heikki Krogerus.

- ACPI EC driver cleanups, mostly related to debugging, from Lv
Zheng.

- cpufreq-dt driver updates from Thomas Petazzoni.

- powernv cpuidle driver update from Preeti U Murthy"

* tag 'pm+acpi-3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (34 commits)
intel_pstate: Correct BYT VID values.
intel_pstate: Fix BYT frequency reporting
intel_pstate: Don't lose sysfs settings during cpu offline
cpufreq: intel_pstate: Reflect current no_turbo state correctly
cpufreq: expose scaling_cur_freq sysfs file for set_policy() drivers
cpufreq: intel_pstate: Fix setting max_perf_pct in performance policy
PCI / PM: handle failure to enable wakeup on PCIe PME
ACPI: invoke acpi_device_wakeup() with correct parameters
PM / freezer: Clean up code after recent fixes
PM: convert do_each_thread to for_each_process_thread
OOM, PM: OOM killed task shouldn't escape PM suspend
freezer: remove obsolete comments in __thaw_task()
freezer: Do not freeze tasks killed by OOM killer
ACPI / platform: provide default DMA mask
cpuidle: powernv: Populate cpuidle state details by querying the device-tree
cpufreq: cpufreq-dt: adjust message related to regulators
cpufreq: cpufreq-dt: extend with platform_data
cpufreq: allow driver-specific data
ACPI / EC: Cleanup coding style.
ACPI / EC: Refine event/query debugging messages.
...

Linus Torvalds
2014-10-25 02:29:31 +0800

24 Oct, 2014

1 commit

46fdb794e shmem: support RENAME_WHITEOUT ... Browse Code »

Allocate a dentry, initialize it with a whiteout and hash it in the place
of the old dentry. Later the old dentry will be moved away and the
whiteout will remain.

i_mutex protects agains concurrent readdir.

Signed-off-by: Miklos Szeredi
Cc: Hugh Dickins

Miklos Szeredi
2014-10-24 06:14:37 +0800

22 Oct, 2014

1 commit

5695be142 OOM, PM: OOM killed task shouldn't escape PM suspend ... Browse Code »
31

PM freezer relies on having all tasks frozen by the time devices are
getting frozen so that no task will touch them while they are getting
frozen. But OOM killer is allowed to kill an already frozen task in
order to handle OOM situtation. In order to protect from late wake ups
OOM killer is disabled after all tasks are frozen. This, however, still
keeps a window open when a killed task didn't manage to die by the time
freeze_processes finishes.

Reduce the race window by checking all tasks after OOM killer has been
disabled. This is still not race free completely unfortunately because
oom_killer_disable cannot stop an already ongoing OOM killer so a task
might still wake up from the fridge and get killed without
freeze_processes noticing. Full synchronization of OOM and freezer is,
however, too heavy weight for this highly unlikely case.

Introduce and check oom_kills counter which gets incremented early when
the allocator enters __alloc_pages_may_oom path and only check all the
tasks if the counter changes during the freezing attempt. The counter
is updated so early to reduce the race window since allocator checked
oom_killer_disabled which is set by PM-freezing code. A false positive
will push the PM-freezer into a slow path but that is not a big deal.

Changes since v1
- push the re-check loop out of freeze_processes into
check_frozen_processes and invert the condition to make the code more
readable as per Rafael

Fixes: f660daac474c6f (oom: thaw threads if oom killed thread is frozen before deferring)
Cc: 3.2+ # 3.2+
Signed-off-by: Michal Hocko
Signed-off-by: Rafael J. Wysocki

Michal Hocko
2014-10-22 05:44:21 +0800

21 Oct, 2014

1 commit

c2661b806 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »
13

Pull ext4 updates from Ted Ts'o:
"A large number of cleanups and bug fixes, with some (minor) journal
optimizations"

[ This got sent to me before -rc1, but was stuck in my spam folder. - Linus ]

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (67 commits)
ext4: check s_chksum_driver when looking for bg csum presence
ext4: move error report out of atomic context in ext4_init_block_bitmap()
ext4: Replace open coded mdata csum feature to helper function
ext4: delete useless comments about ext4_move_extents
ext4: fix reservation overflow in ext4_da_write_begin
ext4: add ext4_iget_normal() which is to be used for dir tree lookups
ext4: don't orphan or truncate the boot loader inode
ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT
ext4: optimize block allocation on grow indepth
ext4: get rid of code duplication
ext4: fix over-defensive complaint after journal abort
ext4: fix return value of ext4_do_update_inode
ext4: fix mmap data corruption when blocksize < pagesize
vfs: fix data corruption when blocksize < pagesize for mmaped data
ext4: fold ext4_nojournal_sops into ext4_sops
ext4: support freezing ext2 (nojournal) file systems
ext4: fold ext4_sync_fs_nojournal() into ext4_sync_fs()
ext4: don't check quota format when there are no quota files
jbd2: simplify calling convention around __jbd2_journal_clean_checkpoint_list
jbd2: avoid pointless scanning of checkpoint lists
...

Linus Torvalds
2014-10-21 00:50:11 +0800

19 Oct, 2014

1 commit

d3dc366bb Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull core block layer changes from Jens Axboe:
"This is the core block IO pull request for 3.18. Apart from the new
and improved flush machinery for blk-mq, this is all mostly bug fixes
and cleanups.

- blk-mq timeout updates and fixes from Christoph.

- Removal of REQ_END, also from Christoph. We pass it through the
->queue_rq() hook for blk-mq instead, freeing up one of the request
bits. The space was overly tight on 32-bit, so Martin also killed
REQ_KERNEL since it's no longer used.

- blk integrity updates and fixes from Martin and Gu Zheng.

- Update to the flush machinery for blk-mq from Ming Lei. Now we
have a per hardware context flush request, which both cleans up the
code should scale better for flush intensive workloads on blk-mq.

- Improve the error printing, from Rob Elliott.

- Backing device improvements and cleanups from Tejun.

- Fixup of a misplaced rq_complete() tracepoint from Hannes.

- Make blk_get_request() return error pointers, fixing up issues
where we NULL deref when a device goes bad or missing. From Joe
Lawrence.

- Prep work for drastically reducing the memory consumption of dm
devices from Junichi Nomura. This allows creating clone bio sets
without preallocating a lot of memory.

- Fix a blk-mq hang on certain combinations of queue depths and
hardware queues from me.

- Limit memory consumption for blk-mq devices for crash dump
scenarios and drivers that use crazy high depths (certain SCSI
shared tag setups). We now just use a single queue and limited
depth for that"

* 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits)
block: Remove REQ_KERNEL
blk-mq: allocate cpumask on the home node
bio-integrity: remove the needless fail handle of bip_slab creating
block: include func name in __get_request prints
block: make blk_update_request print prefix match ratelimited prefix
blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio
block: fix alignment_offset math that assumes io_min is a power-of-2
blk-mq: Make bt_clear_tag() easier to read
blk-mq: fix potential hang if rolling wakeup depth is too high
block: add bioset_create_nobvec()
block: use bio_clone_fast() in blk_rq_prep_clone()
block: misplaced rq_complete tracepoint
sd: Honor block layer integrity handling flags
block: Replace strnicmp with strncasecmp
block: Add T10 Protection Information functions
block: Don't merge requests if integrity flags differ
block: Integrity checksum flag
block: Relocate bio integrity flags
block: Add a disk flag to block integrity profile
block: Add prefix to block integrity profile flags
...

Linus Torvalds
2014-10-19 02:53:51 +0800

14 Oct, 2014

6 commits

dfe2c6dcc Merge branch 'akpm' (patches from Andrew Morton) ... Browse Code »

Merge second patch-bomb from Andrew Morton:
- a few hotfixes
- drivers/dma updates
- MAINTAINERS updates
- Quite a lot of lib/ updates
- checkpatch updates
- binfmt updates
- autofs4
- drivers/rtc/
- various small tweaks to less used filesystems
- ipc/ updates
- kernel/watchdog.c changes

* emailed patches from Andrew Morton : (135 commits)
mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
kernel/param: consolidate __{start,stop}___param[] in
ia64: remove duplicate declarations of __per_cpu_start[] and __per_cpu_end[]
frv: remove unused declarations of __start___ex_table and __stop___ex_table
kvm: ensure hard lockup detection is disabled by default
kernel/watchdog.c: control hard lockup detection default
staging: rtl8192u: use %*pEn to escape buffer
staging: rtl8192e: use %*pEn to escape buffer
staging: wlan-ng: use %*pEhp to print SN
lib80211: remove unused print_ssid()
wireless: hostap: proc: print properly escaped SSID
wireless: ipw2x00: print SSID via %*pE
wireless: libertas: print esaped string via %*pE
lib/vsprintf: add %*pE[achnops] format specifier
lib / string_helpers: introduce string_escape_mem()
lib / string_helpers: refactoring the test suite
lib / string_helpers: move documentation to c-file
include/linux: remove strict_strto* definitions
arch/x86/mm/numa.c: fix boot failure when all nodes are hotpluggable
fs: check bh blocknr earlier when searching lru
...

Linus Torvalds
2014-10-14 09:54:50 +0800
df133e8fa Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 mm updates from Ingo Molnar:
"This tree includes the following changes:

- fix memory hotplug
- fix hibernation bootup memory layout assumptions
- fix hyperv numa guest kernel messages
- remove dead code
- update documentation"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Update memory map description to list hypervisor-reserved area
x86/mm, hibernate: Do not assume the first e820 area to be RAM
x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data()
x86/mm/hotplug: Modify PGD entry when removing memory
x86/mm/hotplug: Pass sync_global_pgds() a correct argument in remove_pagetable()
x86: Remove set_pmd_pfn

Linus Torvalds
2014-10-14 08:22:41 +0800
64e455079 mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared ... Browse Code »

For VMAs that don't want write notifications, PTEs created for read faults
have their write bit set. If the read fault happens after VM_SOFTDIRTY is
cleared, then the PTE's softdirty bit will remain clear after subsequent
writes.

Here's a simple code snippet to demonstrate the bug:

char* m = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_SHARED, -1, 0);
system("echo 4 > /proc/$PPID/clear_refs"); /* clear VM_SOFTDIRTY */
assert(*m == '\0'); /* new PTE allows write access */
assert(!soft_dirty(x));
*m = 'x'; /* should dirty the page */
assert(soft_dirty(x)); /* fails */

With this patch, write notifications are enabled when VM_SOFTDIRTY is
cleared. Furthermore, to avoid unnecessary faults, write notifications
are disabled when VM_SOFTDIRTY is set.

As a side effect of enabling and disabling write notifications with
care, this patch fixes a bug in mprotect where vm_page_prot bits set by
drivers were zapped on mprotect. An analogous bug was fixed in mmap by
commit c9d0bf241451 ("mm: uncached vma support with writenotify").

Signed-off-by: Peter Feiner
Reported-by: Peter Feiner
Suggested-by: Kirill A. Shutemov
Cc: Cyrill Gorcunov
Cc: Pavel Emelyanov
Cc: Jamie Liu
Cc: Hugh Dickins
Cc: Naoya Horiguchi
Cc: Bjorn Helgaas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Feiner
2014-10-14 08:18:28 +0800
de9e14eeb drivers: dma-contiguous: add initialization from device tree ... Browse Code »

Add a function to create CMA region from previously reserved memory and
add support for handling 'shared-dma-pool' reserved-memory device tree
nodes.

Based on previous code provided by Josh Cartwright

Signed-off-by: Marek Szyprowski
Cc: Arnd Bergmann
Cc: Michal Nazarewicz
Cc: Grant Likely
Cc: Laura Abbott
Cc: Josh Cartwright
Cc: Joonsoo Kim
Cc: Kyungmin Park
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Marek Szyprowski
2014-10-14 08:18:12 +0800
68faed630 mm/cma: fix cma bitmap aligned mask computing ... Browse Code »

The current cma bitmap aligned mask computation is incorrect. It could
cause an unexpected alignment when using cma_alloc() if the wanted align
order is larger than cma->order_per_bit.

Take kvm for example (PAGE_SHIFT = 12), kvm_cma->order_per_bit is set to
6. When kvm_alloc_rma() tries to alloc kvm_rma_pages, it will use 15 as
the expected align value. After using the current implementation however,
we get 0 as cma bitmap aligned mask other than 511.

This patch fixes the cma bitmap aligned mask calculation.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Weijie Yang
Acked-by: Michal Nazarewicz
Cc: Joonsoo Kim
Cc: "Aneesh Kumar K.V"
Cc: [3.17]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Weijie Yang
2014-10-14 08:18:12 +0800
85c9f4b04 mm/slab: fix unaligned access on sparc64 ... Browse Code »

Commit bf0dea23a9c0 ("mm/slab: use percpu allocator for cpu cache")
changed the allocation method for cpu cache array from slab allocator to
percpu allocator. Alignment should be provided for aligned memory in
percpu allocator case, but, that commit mistakenly set this alignment to
0. So, percpu allocator returns unaligned memory address. It doesn't
cause any problem on x86 which permits unaligned access, but, it causes
the problem on sparc64 which needs strong guarantee of alignment.

Following bug report is reported from David Miller.

I'm getting tons of the following on sparc64:

[603965.383447] Kernel unaligned access at TPC[546b58] free_block+0x98/0x1a0
[603965.396987] Kernel unaligned access at TPC[546b60] free_block+0xa0/0x1a0
...
[603970.554394] log_unaligned: 333 callbacks suppressed
...

This patch provides a proper alignment parameter when allocating cpu
cache to fix this unaligned memory access problem on sparc64.

Reported-by: David Miller
Tested-by: David Miller
Tested-by: Meelis Roos
Signed-off-by: Joonsoo Kim
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2014-10-14 08:18:12 +0800

13 Oct, 2014

2 commits

d6dd50e07 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull RCU updates from Ingo Molnar:
"The main changes in this cycle were:

- changes related to No-CBs CPUs and NO_HZ_FULL

- RCU-tasks implementation

- torture-test updates

- miscellaneous fixes

- locktorture updates

- RCU documentation updates"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits)
workqueue: Use cond_resched_rcu_qs macro
workqueue: Add quiescent state between work items
locktorture: Cleanup header usage
locktorture: Cannot hold read and write lock
locktorture: Fix __acquire annotation for spinlock irq
locktorture: Support rwlocks
rcu: Eliminate deadlock between CPU hotplug and expedited grace periods
locktorture: Document boot/module parameters
rcutorture: Rename rcutorture_runnable parameter
locktorture: Add test scenario for rwsem_lock
locktorture: Add test scenario for mutex_lock
locktorture: Make torture scripting account for new _runnable name
locktorture: Introduce torture context
locktorture: Support rwsems
locktorture: Add infrastructure for torturing read locks
torture: Address race in module cleanup
locktorture: Make statistics generic
locktorture: Teach about lock debugging
locktorture: Support mutexes
locktorture: Add documentation
...

Linus Torvalds
2014-10-13 21:44:12 +0800
77c688ac8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"The big thing in this pile is Eric's unmount-on-rmdir series; we
finally have everything we need for that. The final piece of prereqs
is delayed mntput() - now filesystem shutdown always happens on
shallow stack.

Other than that, we have several new primitives for iov_iter (Matt
Wilcox, culled from his XIP-related series) pushing the conversion to
->read_iter()/ ->write_iter() a bit more, a bunch of fs/dcache.c
cleanups and fixes (including the external name refcounting, which
gives consistent behaviour of d_move() wrt procfs symlinks for long
and short names alike) and assorted cleanups and fixes all over the
place.

This is just the first pile; there's a lot of stuff from various
people that ought to go in this window. Starting with
unionmount/overlayfs mess... ;-/"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (60 commits)
fs/file_table.c: Update alloc_file() comment
vfs: Deduplicate code shared by xattr system calls operating on paths
reiserfs: remove pointless forward declaration of struct nameidata
don't need that forward declaration of struct nameidata in dcache.h anymore
take dname_external() into fs/dcache.c
let path_init() failures treated the same way as subsequent link_path_walk()
fix misuses of f_count() in ppp and netlink
ncpfs: use list_for_each_entry() for d_subdirs walk
vfs: move getname() from callers to do_mount()
gfs2_atomic_open(): skip lookups on hashed dentry
[infiniband] remove pointless assignments
gadgetfs: saner API for gadgetfs_create_file()
f_fs: saner API for ffs_sb_create_file()
jfs: don't hash direct inode
[s390] remove pointless assignment of ->f_op in vmlogrdr ->open()
ecryptfs: ->f_op is never NULL
android: ->f_op is never NULL
nouveau: __iomem misannotations
missing annotation in fs/file.c
fs: namespace: suppress 'may be used uninitialized' warnings
...

Linus Torvalds
2014-10-13 17:28:42 +0800

12 Oct, 2014

1 commit

ce254b34d Merge tag 'tiny/no-advice-fixup-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux ... Browse Code »

Pull tinification fix from Josh "Paper Bag" Triplett:
"Fixup to use PATCHv2 of 'mm: Support compiling out madvise and
fadvise'"

* tag 'tiny/no-advice-fixup-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux:
mm: Support fadvise without CONFIG_MMU

Linus Torvalds
2014-10-12 21:21:57 +0800

11 Oct, 2014

1 commit

887e7019e mm: Support fadvise without CONFIG_MMU ... Browse Code »

Commit d3ac21cacc24790eb45d735769f35753f5b56ceb ("mm: Support compiling
out madvise and fadvise") incorrectly made fadvise conditional on
CONFIG_MMU. (The merged branch unintentionally incorporated v1 of the
patch rather than the fixed v2.) Apply the delta from v1 to v2, to
allow fadvise without CONFIG_MMU.

Reported-by: Johannes Weiner
Signed-off-by: Josh Triplett

Josh Triplett
2014-10-11 04:12:28 +0800

10 Oct, 2014

5 commits

c798360cd Merge branch 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu ... Browse Code »

Pull percpu updates from Tejun Heo:
"A lot of activities on percpu front. Notable changes are...

- percpu allocator now can take @gfp. If @gfp doesn't contain
GFP_KERNEL, it tries to allocate from what's already available to
the allocator and a work item tries to keep the reserve around
certain level so that these atomic allocations usually succeed.

This will replace the ad-hoc percpu memory pool used by
blk-throttle and also be used by the planned blkcg support for
writeback IOs.

Please note that I noticed a bug in how @gfp is interpreted while
preparing this pull request and applied the fix 6ae833c7fe0c
("percpu: fix how @gfp is interpreted by the percpu allocator")
just now.

- percpu_ref now uses longs for percpu and global counters instead of
ints. It leads to more sparse packing of the percpu counters on
64bit machines but the overhead should be negligible and this
allows using percpu_ref for refcnting pages and in-memory objects
directly.

- The switching between percpu and single counter modes of a
percpu_ref is made independent of putting the base ref and a
percpu_ref can now optionally be initialized in single or killed
mode. This allows avoiding percpu shutdown latency for cases where
the refcounted objects may be synchronously created and destroyed
in rapid succession with only a fraction of them reaching fully
operational status (SCSI probing does this when combined with
blk-mq support). It's also planned to be used to implement forced
single mode to detect underflow more timely for debugging.

There's a separate branch percpu/for-3.18-consistent-ops which cleans
up the duplicate percpu accessors. That branch causes a number of
conflicts with s390 and other trees. I'll send a separate pull
request w/ resolutions once other branches are merged"

* 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits)
percpu: fix how @gfp is interpreted by the percpu allocator
blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode
percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky
percpu_ref: add PERCPU_REF_INIT_* flags
percpu_ref: decouple switching to percpu mode and reinit
percpu_ref: decouple switching to atomic mode and killing
percpu_ref: add PCPU_REF_DEAD
percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch
percpu_ref: replace pcpu_ prefix with percpu_
percpu_ref: minor code and comment updates
percpu_ref: relocate percpu_ref_reinit()
Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe"
Revert "percpu: free percpu allocation info for uniprocessor system"
percpu-refcount: make percpu_ref based on longs instead of ints
percpu-refcount: improve WARN messages
percpu: fix locking regression in the failure path of pcpu_alloc()
percpu-refcount: add @gfp to percpu_ref_init()
proportions: add @gfp to init functions
percpu_counter: add @gfp to percpu_counter_init()
percpu_counter: make percpu_counters_lock irq-safe
...

Linus Torvalds
2014-10-10 19:26:02 +0800
b211e9d7c Merge branch 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:
"Nothing too interesting. Just a handful of cleanup patches"

* 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
Revert "cgroup: remove redundant variable in cgroup_mount()"
cgroup: remove redundant variable in cgroup_mount()
cgroup: fix missing unlock in cgroup_release_agent()
cgroup: remove CGRP_RELEASABLE flag
perf/cgroup: Remove perf_put_cgroup()
cgroup: remove redundant check in cgroup_ino()
cpuset: simplify proc_cpuset_show()
cgroup: simplify proc_cgroup_show()
cgroup: use a per-cgroup work for release agent
cgroup: remove bogus comments
cgroup: remove redundant code in cgroup_rmdir()
cgroup: remove some useless forward declarations
cgroup: fix a typo in comment.

Linus Torvalds
2014-10-10 19:24:40 +0800
f203c3b33 zbud: avoid accessing last unused freelist ... Browse Code »

For now, there are NCHUNKS of 64 freelists in zbud_pool, the last
unbuddied[63] freelist linked with all zbud pages which have free chunks
of 63. Calculating according to context of num_free_chunks(), our max
chunk number of unbuddied zbud page is 62, so none of zbud pages will be
added/removed in last freelist, but still we will try to find an unbuddied
zbud page in the last unused freelist, it is unneeded.

This patch redefines NCHUNKS to 63 as free chunk number in one zbud page,
hence we can decrease size of zpool and avoid accessing the last unused
freelist whenever failing to allocate zbud from freelist in zbud_alloc.

Signed-off-by: Chao Yu
Cc: Seth Jennings
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chao Yu
2014-10-10 10:26:03 +0800
5538c5623 zsmalloc: simplify init_zspage free obj linking ... Browse Code »

Change zsmalloc init_zspage() logic to iterate through each object on each
of its pages, checking the offset to verify the object is on the current
page before linking it into the zspage.

The current zsmalloc init_zspage free object linking code has logic that
relies on there only being one page per zspage when PAGE_SIZE is a
multiple of class->size. It calculates the number of objects for the
current page, and iterates through all of them plus one, to account for
the assumed partial object at the end of the page. While this currently
works, the logic can be simplified to just link the object at each
successive offset until the offset is larger than PAGE_SIZE, which does
not rely on PAGE_SIZE being a multiple of class->size.

Signed-off-by: Dan Streetman
Acked-by: Minchan Kim
Cc: Sergey Senozhatsky
Cc: Nitin Gupta
Cc: Seth Jennings
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Streetman
2014-10-10 10:26:03 +0800
6dd9737e3 mm/zsmalloc.c: correct comment for fullness group computation ... Browse Code »

The letter 'f' in "n
Acked-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wang Sheng-Hui
2014-10-10 10:26:03 +0800