Doug / smarc-fsl-linux-kernel | Embedian Git Server

28 May, 2013

1 commit

f884ab15a doc: fix misspellings with 'codespell' tool ... Browse Code »

Signed-off-by: Anatol Pomozov
Signed-off-by: Jiri Kosina

Anatol Pomozov
2013-05-28 18:02:12 +0800

30 Apr, 2013

1 commit

c9b1d0981 mm: limit growth of 3% hardcoded other user reserve ... Browse Code »

Add user_reserve_kbytes knob.

Limit the growth of the memory reserved for other user processes to
min(3% current process size, user_reserve_pages). Only about 8MB is
necessary to enable recovery in the default mode, and only a few hundred
MB are required even when overcommit is disabled.

user_reserve_pages defaults to min(3% free pages, 128MB)

I arrived at 128MB by taking the max VSZ of sshd, login, bash, and top ...
then adding the RSS of each.

This only affects OVERCOMMIT_NEVER mode.

Background

1. user reserve

__vm_enough_memory reserves a hardcoded 3% of the current process size for
other applications when overcommit is disabled. This was done so that a
user could recover if they launched a memory hogging process. Without the
reserve, a user would easily run into a message such as:

bash: fork: Cannot allocate memory

2. admin reserve

Additionally, a hardcoded 3% of free memory is reserved for root in both
overcommit 'guess' and 'never' modes. This was intended to prevent a
scenario where root-cant-log-in and perform recovery operations.

Note that this reserve shrinks, and doesn't guarantee a useful reserve.

Motivation

The two hardcoded memory reserves should be updated to account for current
memory sizes.

Also, the admin reserve would be more useful if it didn't shrink too much.

When the current code was originally written, 1GB was considered
"enterprise". Now the 3% reserve can grow to multiple GB on large memory
systems, and it only needs to be a few hundred MB at most to enable a user
or admin to recover a system with an unwanted memory hogging process.

I've found that reducing these reserves is especially beneficial for a
specific type of application load:

* single application system
* one or few processes (e.g. one per core)
* allocating all available memory
* not initializing every page immediately
* long running

I've run scientific clusters with this sort of load. A long running job
sometimes failed many hours (weeks of CPU time) into a calculation. They
weren't initializing all of their memory immediately, and they weren't
using calloc, so I put systems into overcommit 'never' mode. These
clusters run diskless and have no swap.

However, with the current reserves, a user wishing to allocate as much
memory as possible to one process may be prevented from using, for
example, almost 2GB out of 32GB.

The effect is less, but still significant when a user starts a job with
one process per core. I have repeatedly seen a set of processes
requesting the same amount of memory fail because one of them could not
allocate the amount of memory a user would expect to be able to allocate.
For example, Message Passing Interfce (MPI) processes, one per core. And
it is similar for other parallel programming frameworks.

Changing this reserve code will make the overcommit never mode more useful
by allowing applications to allocate nearly all of the available memory.

Also, the new admin_reserve_kbytes will be safer than the current behavior
since the hardcoded 3% of available memory reserve can shrink to something
useless in the case where applications have grabbed all available memory.

Risks

* "bash: fork: Cannot allocate memory"

The downside of the first patch-- which creates a tunable user reserve
that is only used in overcommit 'never' mode--is that an admin can set
it so low that a user may not be able to kill their process, even if
they already have a shell prompt.

Of course, a user can get in the same predicament with the current 3%
reserve--they just have to launch processes until 3% becomes negligible.

* root-cant-log-in problem

The second patch, adding the tunable rootuser_reserve_pages, allows
the admin to shoot themselves in the foot by setting it too small. They
can easily get the system into a state where root-can't-log-in.

However, the new admin_reserve_kbytes will be safer than the current
behavior since the hardcoded 3% of available memory reserve can shrink
to something useless in the case where applications have grabbed all
available memory.

Alternatives

* Memory cgroups provide a more flexible way to limit application memory.

Not everyone wants to set up cgroups or deal with their overhead.

* We could create a fourth overcommit mode which provides smaller reserves.

The size of useful reserves may be drastically different depending
on the whether the system is embedded or enterprise.

* Force users to initialize all of their memory or use calloc.

Some users don't want/expect the system to overcommit when they malloc.
Overcommit 'never' mode is for this scenario, and it should work well.

The new user and admin reserve tunables are simple to use, with low
overhead compared to cgroups. The patches preserve current behavior where
3% of memory is less than 128MB, except that the admin reserve doesn't
shrink to an unusable size under pressure. The code allows admins to tune
for embedded and enterprise usage.

FAQ

* How is the root-cant-login problem addressed?
What happens if admin_reserve_pages is set to 0?

Root is free to shoot themselves in the foot by setting
admin_reserve_kbytes too low.

On x86_64, the minimum useful reserve is:
8MB for overcommit 'guess'
128MB for overcommit 'never'

admin_reserve_pages defaults to min(3% free memory, 8MB)

So, anyone switching to 'never' mode needs to adjust
admin_reserve_pages.

* How do you calculate a minimum useful reserve?

A user or the admin needs enough memory to login and perform
recovery operations, which includes, at a minimum:

sshd or login + bash (or some other shell) + top (or ps, kill, etc.)

For overcommit 'guess', we can sum resident set sizes (RSS)
because we only need enough memory to handle what the recovery
programs will typically use. On x86_64 this is about 8MB.

For overcommit 'never', we can take the max of their virtual sizes (VSZ)
and add the sum of their RSS. We use VSZ instead of RSS because mode
forces us to ensure we can fulfill all of the requested memory allocations--
even if the programs only use a fraction of what they ask for.
On x86_64 this is about 128MB.

When swap is enabled, reserves are useful even when they are as
small as 10MB, regardless of overcommit mode.

When both swap and overcommit are disabled, then the admin should
tune the reserves higher to be absolutley safe. Over 230MB each
was safest in my testing.

* What happens if user_reserve_pages is set to 0?

Note, this only affects overcomitt 'never' mode.

Then a user will be able to allocate all available memory minus
admin_reserve_kbytes.

However, they will easily see a message such as:

"bash: fork: Cannot allocate memory"

And they won't be able to recover/kill their application.
The admin should be able to recover the system if
admin_reserve_kbytes is set appropriately.

* What's the difference between overcommit 'guess' and 'never'?

"Guess" allows an allocation if there are enough free + reclaimable
pages. It has a hardcoded 3% of free pages reserved for root.

"Never" allows an allocation if there is enough swap + a configurable
percentage (default is 50) of physical RAM. It has a hardcoded 3% of
free pages reserved for root, like "Guess" mode. It also has a
hardcoded 3% of the current process size reserved for additional
applications.

* Why is overcommit 'guess' not suitable even when an app eventually
writes to every page? It takes free pages, file pages, available
swap pages, reclaimable slab pages into consideration. In other words,
these are all pages available, then why isn't overcommit suitable?

Because it only looks at the present state of the system. It
does not take into account the memory that other applications have
malloced, but haven't initialized yet. It overcommits the system.

Test Summary

There was little change in behavior in the default overcommit 'guess'
mode with swap enabled before and after the patch. This was expected.

Systems run most predictably (i.e. no oom kills) in overcommit 'never'
mode with swap enabled. This also allowed the most memory to be allocated
to a user application.

Overcommit 'guess' mode without swap is a bad idea. It is easy to
crash the system. None of the other tested combinations crashed.
This matches my experience on the Roadrunner supercomputer.

Without the tunable user reserve, a system in overcommit 'never' mode
and without swap does not allow the admin to recover, although the
admin can.

With the new tunable reserves, a system in overcommit 'never' mode
and without swap can be configured to:

1. maximize user-allocatable memory, running close to the edge of
recoverability

2. maximize recoverability, sacrificing allocatable memory to
ensure that a user cannot take down a system

Test Description

Fedora 18 VM - 4 x86_64 cores, 5725MB RAM, 4GB Swap

System is booted into multiuser console mode, with unnecessary services
turned off. Caches were dropped before each test.

Hogs are user memtester processes that attempt to allocate all free memory
as reported by /proc/meminfo

In overcommit 'never' mode, memory_ratio=100

Test Results

3.9.0-rc1-mm1

Overcommit | Swap | Hogs | MB Got/Wanted | OOMs | User Recovery | Admin Recovery
---------- ---- ---- ------------- ---- ------------- --------------
guess yes 1 5432/5432 no yes yes
guess yes 4 5444/5444 1 yes yes
guess no 1 5302/5449 no yes yes
guess no 4 - crash no no

never yes 1 5460/5460 1 yes yes
never yes 4 5460/5460 1 yes yes
never no 1 5218/5432 no no yes
never no 4 5203/5448 no no yes

3.9.0-rc1-mm1-tunablereserves

User and Admin Recovery show their respective reserves, if applicable.

Overcommit | Swap | Hogs | MB Got/Wanted | OOMs | User Recovery | Admin Recovery
---------- ---- ---- ------------- ---- ------------- --------------
guess yes 1 5419/5419 no - yes 8MB yes
guess yes 4 5436/5436 1 - yes 8MB yes
guess no 1 5440/5440 * - yes 8MB yes
guess no 4 - crash - no 8MB no

* process would successfully mlock, then the oom killer would pick it

never yes 1 5446/5446 no 10MB yes 20MB yes
never yes 4 5456/5456 no 10MB yes 20MB yes
never no 1 5387/5429 no 128MB no 8MB barely
never no 1 5323/5428 no 226MB barely 8MB barely
never no 1 5323/5428 no 226MB barely 8MB barely

never no 1 5359/5448 no 10MB no 10MB barely

never no 1 5323/5428 no 0MB no 10MB barely
never no 1 5332/5428 no 0MB no 50MB yes
never no 1 5293/5429 no 0MB no 90MB yes

never no 1 5001/5427 no 230MB yes 338MB yes
never no 4* 4998/5424 no 230MB yes 338MB yes

* more memtesters were launched, able to allocate approximately another 100MB

Future Work

- Test larger memory systems.

- Test an embedded image.

- Test other architectures.

- Time malloc microbenchmarks.

- Would it be useful to be able to set overcommit policy for
each memory cgroup?

- Some lines are slightly above 80 chars.
Perhaps define a macro to convert between pages and kb?
Other places in the kernel do this.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: make init_user_reserve() static]
Signed-off-by: Andrew Shewmaker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Shewmaker
2013-04-30 06:54:36 +0800

24 Feb, 2013

2 commits

8fdb3dbf0 ksm: add some comments ... Browse Code »

Added slightly more detail to the Documentation of merge_across_nodes, a
few comments in areas indicated by review, and renamed get_ksm_page()'s
argument from "locked" to "lock_it". No functional change.

Signed-off-by: Hugh Dickins
Cc: Mel Gorman
Cc: Petr Holasek
Cc: Andrea Arcangeli
Cc: Izik Eidus
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2013-02-24 09:50:23 +0800
90bd6fd31 ksm: allow trees per NUMA node ... Browse Code »

Here's a KSM series, based on mmotm 2013-01-23-17-04: starting with
Petr's v7 "KSM: numa awareness sysfs knob"; then fixing the two issues
we had with that, fully enabling KSM page migration on the way.

(A different kind of KSM/NUMA issue which I've certainly not begun to
address here: when KSM pages are unmerged, there's usually no sense in
preferring to allocate the new pages local to the caller's node.)

This patch:

Introduces new sysfs boolean knob /sys/kernel/mm/ksm/merge_across_nodes
which control merging pages across different numa nodes. When it is set
to zero only pages from the same node are merged, otherwise pages from
all nodes can be merged together (default behavior).

Typical use-case could be a lot of KVM guests on NUMA machine and cpus
from more distant nodes would have significant increase of access
latency to the merged ksm page. Sysfs knob was choosen for higher
variability when some users still prefers higher amount of saved
physical memory regardless of access latency.

Every numa node has its own stable & unstable trees because of faster
searching and inserting. Changing of merge_across_nodes value is
possible only when there are not any ksm shared pages in system.

I've tested this patch on numa machines with 2, 4 and 8 nodes and
measured speed of memory access inside of KVM guests with memory pinned
to one of nodes with this benchmark:

http://pholasek.fedorapeople.org/alloc_pg.c

Population standard deviations of access times in percentage of average
were following:

merge_across_nodes=1
2 nodes 1.4%
4 nodes 1.6%
8 nodes 1.7%

merge_across_nodes=0
2 nodes 1%
4 nodes 0.32%
8 nodes 0.018%

RFC: https://lkml.org/lkml/2011/11/30/91
v1: https://lkml.org/lkml/2012/1/23/46
v2: https://lkml.org/lkml/2012/6/29/105
v3: https://lkml.org/lkml/2012/9/14/550
v4: https://lkml.org/lkml/2012/9/23/137
v5: https://lkml.org/lkml/2012/12/10/540
v6: https://lkml.org/lkml/2012/12/23/154
v7: https://lkml.org/lkml/2012/12/27/225

Hugh notes that this patch brings two problems, whose solution needs
further support in mm/ksm.c, which follows in subsequent patches:

1) switching merge_across_nodes after running KSM is liable to oops
on stale nodes still left over from the previous stable tree;

2) memory hotremove may migrate KSM pages, but there is no provision
here for !merge_across_nodes to migrate nodes to the proper tree.

Signed-off-by: Petr Holasek
Signed-off-by: Hugh Dickins
Acked-by: Rik van Riel
Cc: Andrea Arcangeli
Cc: Izik Eidus
Cc: Gerald Schaefer
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Holasek
2013-02-24 09:50:19 +0800

14 Dec, 2012

1 commit

f6e858a00 Merge branch 'akpm' (Andrew's patch-bomb) ... Browse Code »

Merge misc VM changes from Andrew Morton:
"The rest of most-of-MM. The other MM bits await a slab merge.

This patch includes the addition of a huge zero_page. Not a
performance boost but it an save large amounts of physical memory in
some situations.

Also a bunch of Fujitsu engineers are working on memory hotplug.
Which, as it turns out, was badly broken. About half of their patches
are included here; the remainder are 3.8 material."

However, this merge disables CONFIG_MOVABLE_NODE, which was totally
broken. We don't add new features with "default y", nor do we add
Kconfig questions that are incomprehensible to most people without any
help text. Does the feature even make sense without compaction or
memory hotplug?

* akpm: (54 commits)
mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic()
mm/memory.c: remove unused code from do_wp_page()
asm-generic, mm: pgtable: consolidate zero page helpers
mm/hugetlb.c: fix warning on freeing hwpoisoned hugepage
hwpoison, hugetlbfs: fix RSS-counter warning
hwpoison, hugetlbfs: fix "bad pmd" warning in unmapping hwpoisoned hugepage
mm: protect against concurrent vma expansion
memcg: do not check for mm in __mem_cgroup_count_vm_event
tmpfs: support SEEK_DATA and SEEK_HOLE (reprise)
mm: provide more accurate estimation of pages occupied by memmap
fs/buffer.c: remove redundant initialization in alloc_page_buffers()
fs/buffer.c: do not inline exported function
writeback: fix a typo in comment
mm: introduce new field "managed_pages" to struct zone
mm, oom: remove statically defined arch functions of same name
mm, oom: remove redundant sleep in pagefault oom handler
mm, oom: cleanup pagefault oom handler
memory_hotplug: allow online/offline memory to result movable node
numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
mm, memcg: avoid unnecessary function call when memcg is disabled
...

Linus Torvalds
2012-12-14 05:11:15 +0800

13 Dec, 2012

3 commits

79da5407e thp: introduce sysfs knob to disable huge zero page ... Browse Code »

By default kernel tries to use huge zero page on read page fault. It's
possible to disable huge zero page by writing 0 or enable it back by
writing 1:

echo 0 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page
echo 1 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page

Signed-off-by: Kirill A. Shutemov
Cc: Andrea Arcangeli
Cc: Andi Kleen
Cc: "H. Peter Anvin"
Cc: Mel Gorman
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2012-12-13 09:38:32 +0800
d8a8e1f0d thp, vmstat: implement HZP_ALLOC and HZP_ALLOC_FAILED events ... Browse Code »

hzp_alloc is incremented every time a huge zero page is successfully
allocated. It includes allocations which where dropped due
race with other allocation. Note, it doesn't count every map
of the huge zero page, only its allocation.

hzp_alloc_failed is incremented if kernel fails to allocate huge zero
page and falls back to using small pages.

Signed-off-by: Kirill A. Shutemov
Cc: Andrea Arcangeli
Cc: Andi Kleen
Cc: "H. Peter Anvin"
Cc: Mel Gorman
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2012-12-13 09:38:32 +0800
e180377f1 thp: change split_huge_page_pmd() interface ... Browse Code »

Pass vma instead of mm and add address parameter.

In most cases we already have vma on the stack. We provides
split_huge_page_pmd_mm() for few cases when we have mm, but not vma.

This change is preparation to huge zero pmd splitting implementation.

Signed-off-by: Kirill A. Shutemov
Cc: Andrea Arcangeli
Cc: Andi Kleen
Cc: "H. Peter Anvin"
Cc: Mel Gorman
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2012-12-13 09:38:31 +0800

19 Nov, 2012

1 commit

4e79162a5 doc: fix quite a few typos within Documentation ... Browse Code »

Correct spelling typo in Documentations

Signed-off-by: Jiri Kosina

Masanari Iida
2012-11-19 21:28:24 +0800

09 Oct, 2012

2 commits

39b5f29ac mm: remove vma arg from page_evictable ... Browse Code »

page_evictable(page, vma) is an irritant: almost all its callers pass
NULL for vma. Remove the vma arg and use mlocked_vma_newpage(vma, page)
explicitly in the couple of places it's needed. But in those places we
don't even need page_evictable() itself! They're dealing with a freshly
allocated anonymous page, which has no "mapping" and cannot be mlocked yet.

Signed-off-by: Hugh Dickins
Acked-by: Mel Gorman
Cc: Rik van Riel
Acked-by: Johannes Weiner
Cc: Michel Lespinasse
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-10-09 15:22:55 +0800
314e51b98 mm: kill vma flag VM_RESERVED and mm->reserved_vm counter ... Browse Code »

A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:

| effect | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP

This patch removes reserved_vm counter from mm_struct. Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.

Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.

remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.

[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov
Cc: Alexander Viro
Cc: Carsten Otte
Cc: Chris Metcalf
Cc: Cyrill Gorcunov
Cc: Eric Paris
Cc: H. Peter Anvin
Cc: Hugh Dickins
Cc: Ingo Molnar
Cc: James Morris
Cc: Jason Baron
Cc: Kentaro Takeda
Cc: Matt Helsley
Cc: Nick Piggin
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Robert Richter
Cc: Suresh Siddha
Cc: Tetsuo Handa
Cc: Venkatesh Pallipadi
Acked-by: Linus Torvalds
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2012-10-09 15:22:19 +0800

22 Aug, 2012

1 commit

d46f3d86f hugetlb: update hugetlbpage.txt ... Browse Code »

Commit f0f57b2b1488 ("mm: move hugepage test examples to
tools/testing/selftests/vm") moved map_hugetlb.c, hugepage-shm.c and
hugepage-mmap.c tests into tools/testing/selftests/vm/ directory, but it
didn't update hugetlbpage.txt

Signed-off-by: Zhouping Liu
Acked-by: Dave Young
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhouping Liu
2012-08-22 07:45:03 +0800

23 Jul, 2012

1 commit

1d00015e2 mm/frontswap: cleanup doc and comment error ... Browse Code »

Signed-off-by: Wanpeng Li
Signed-off-by: Konrad Rzeszutek Wilk

Wanpeng Li
2012-07-23 23:16:20 +0800

05 Jun, 2012

1 commit

a3fe778c7 Merge tag 'stable/frontswap.v16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm ... Browse Code »

Pull frontswap feature from Konrad Rzeszutek Wilk:
"Frontswap provides a "transcendent memory" interface for swap pages.
In some environments, dramatic performance savings may be obtained
because swapped pages are saved in RAM (or a RAM-like device) instead
of a swap disk. This tag provides the basic infrastructure along with
some changes to the existing backends."

Fix up trivial conflict in mm/Makefile due to removal of swap token code
changing a line next to the new frontswap entry.

This pull request came in before the merge window even opened, it got
delayed to after the merge window by me just wanting to make sure it had
actual users. Apparently IBM is using this on their embedded side, and
Jan Beulich says that it's already made available for SLES and OpenSUSE
users.

Also acked by Rik van Riel, and Konrad points to other people liking it
too. So in it goes.

By Dan Magenheimer (4) and Konrad Rzeszutek Wilk (2)
via Konrad Rzeszutek Wilk
* tag 'stable/frontswap.v16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm:
frontswap: s/put_page/store/g s/get_page/load
MAINTAINER: Add myself for the frontswap API
mm: frontswap: config and doc files
mm: frontswap: core frontswap functionality
mm: frontswap: core swap subsystem hooks and headers
mm: frontswap: add frontswap header file

Linus Torvalds
2012-06-05 03:28:45 +0800

02 Jun, 2012

1 commit

af4f8ba31 Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux ... Browse Code »

Pull slab updates from Pekka Enberg:
"Mainly a bunch of SLUB fixes from Joonsoo Kim"

* 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
slub: use __SetPageSlab function to set PG_slab flag
slub: fix a memory leak in get_partial_node()
slub: remove unused argument of init_kmem_cache_node()
slub: fix a possible memory leak
Documentations: Fix slabinfo.c directory in vm/slub.txt
slub: fix incorrect return type of get_any_partial()

Linus Torvalds
2012-06-02 07:50:23 +0800

01 Jun, 2012

1 commit

052fb0d63 proc: report file/anon bit in /proc/pid/pagemap ... Browse Code »

This is an implementation of Andrew's proposal to extend the pagemap file
bits to report what is missing about tasks' working set.

The problem with the working set detection is multilateral. In the criu
(checkpoint/restore) project we dump the tasks' memory into image files
and to do it properly we need to detect which pages inside mappings are
really in use. The mincore syscall I though could help with this did not.
First, it doesn't report swapped pages, thus we cannot find out which
parts of anonymous mappings to dump. Next, it does report pages from page
cache as present even if they are not mapped, and it doesn't make that has
not been cow-ed.

Note, that issue with swap pages is critical -- we must dump swap pages to
image file. But the issues with file pages are optimization -- we can
take all file pages to image, this would be correct, but if we know that a
page is not mapped or not cow-ed, we can remove them from dump file. The
dump would still be self-consistent, though significantly smaller in size
(up to 10 times smaller on real apps).

Andrew noticed, that the proc pagemap file solved 2 of 3 above issues --
it reports whether a page is present or swapped and it doesn't report not
mapped page cache pages. But, it doesn't distinguish cow-ed file pages
from not cow-ed.

I would like to make the last unused bit in this file to report whether the
page mapped into respective pte is PageAnon or not.

[comment stolen from Pavel Emelyanov's v1 patch]

Signed-off-by: Konstantin Khlebnikov
Cc: Pavel Emelyanov
Cc: Matt Mackall
Cc: Hugh Dickins
Cc: Rik van Riel
Acked-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2012-06-01 08:49:29 +0800

30 May, 2012

1 commit

692569946 mm: document the meminfo and vmstat fields of relevance to transparent hugepages ... Browse Code »

Update Documentation/vm/transhuge.txt and
Documentation/filesystems/proc.txt with some information on monitoring
transparent huge page usage and the associated overhead.

Signed-off-by: Mel Gorman
Signed-off-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-05-30 07:22:23 +0800

15 May, 2012

2 commits

165c8aed5 frontswap: s/put_page/store/g s/get_page/load ... Browse Code »

Sounds so much more natural.

Suggested-by: Andrea Arcangeli
Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2012-05-15 23:34:08 +0800
27c6aec21 mm: frontswap: config and doc files ... Browse Code »

This patch 4of4 adds configuration and documentation files including a FAQ.

[v14: updated docs/FAQ to use zcache and RAMster as examples]
[v10: no change]
[v9: akpm@linux-foundation.org: sysfs->debugfs; no longer need Doc/ABI file]
[v8: rebase to 3.0-rc4]
[v7: rebase to 3.0-rc3]
[v6: rebase to 3.0-rc1]
[v5: change config default to n]
[v4: rebase to 2.6.39]
Signed-off-by: Dan Magenheimer
Acked-by: Jan Beulich
Acked-by: Seth Jennings
Cc: Jeremy Fitzhardinge
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Nitin Gupta
Cc: Matthew Wilcox
Cc: Chris Mason
Cc: Rik Riel
Cc: Andrew Morton
Signed-off-by: Konrad Rzeszutek Wilk

Dan Magenheimer
2012-05-15 23:34:03 +0800

10 May, 2012

1 commit

9fe496116 Documentations: Fix slabinfo.c directory in vm/slub.txt ... Browse Code »

Because the place of slabinfo.c changed.So update in slub.txt.

Acked-by: Christoph Lameter
Signed-off-by: majianpeng
Signed-off-by: Pekka Enberg

majianpeng
2012-05-10 16:45:23 +0800

29 Mar, 2012

2 commits

f0f57b2b1 mm: move hugepage test examples to tools/testing/selftests/vm ... Browse Code »

hugepage-mmap.c, hugepage-shm.c and map_hugetlb.c in Documentation/vm are
simple pass/fail tests, It's better to promote them to
tools/testing/selftests.

Thanks suggestion of Andrew Morton about this. They all need firstly
setting up proper nr_hugepages and hugepage-mmap need to mount hugetlbfs.
So I add a shell script run_vmtests to do such work which will call the
three test programs and check the return value of them.

Changes to original code including below:
a. add run_vmtests script
b. return error when read_bytes mismatch with writed bytes.
c. coding style fixes: do not use assignment in if condition

[akpm@linux-foundation.org: build the targets before trying to execute them]
[akpm@linux-foundation.org: Documentation/vm/ no longer has a Makefile. Fixes "make clean"]
Signed-off-by: Dave Young
Cc: Wu Fengguang
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Frederic Weisbecker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Young
2012-03-29 08:14:37 +0800
c6dd897f3 mm: move page-types.c from Documentation to tools/vm ... Browse Code »

tools/ is the better place for vm tools which are used by many people.
Moving them to tools also make them open to more users instead of hide in
Documentation folder.

This patch moves page-types.c to tools/vm/page-types.c. Also add a
Makefile in tools/vm and fix two coding style problems: a) change const
arrary to 'const char * const', b) change a space to tab for indent.

Signed-off-by: Dave Young
Acked-by: Wu Fengguang
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Frederic Weisbecker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Young
2012-03-29 08:14:37 +0800

23 Mar, 2012

1 commit

aab008db8 Merge tag 'stable/for-linus-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm ... Browse Code »

Pull cleancache changes from Konrad Rzeszutek Wilk:
"This has some patches for the cleancache API that should have been
submitted a _long_ time ago. They are basically cleanups:

- rename of flush to invalidate

- moving reporting of statistics into debugfs

- use __read_mostly as necessary.

Oh, and also the MAINTAINERS file change. The files (except the
MAINTAINERS file) have been in #linux-next for months now. The late
addition of MAINTAINERS file is a brain-fart on my side - didn't
realize I needed that just until I was typing this up - and I based
that patch on v3.3 - so the tree is on top of v3.3."

* tag 'stable/for-linus-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm:
MAINTAINERS: Adding cleancache API to the list.
mm: cleancache: Use __read_mostly as appropiate.
mm: cleancache: report statistics via debugfs instead of sysfs.
mm: zcache/tmem/cleancache: s/flush/invalidate/
mm: cleancache: s/flush/invalidate/

Linus Torvalds
2012-03-23 10:52:47 +0800

22 Mar, 2012

1 commit

807f0ccfe pagemap: document KPF_THP and make page-types aware of it ... Browse Code »

page-types, which is a common user of pagemap, gets aware of thp with this
patch. This helps system admins and kernel hackers know about how thp
works. Here is a sample output of page-types over a thp:

$ page-types -p --raw --list

voffset offset len flags
...
7f9d40200 3f8400 1 ___U_lA____Ma_bH______t____________
7f9d40201 3f8401 1ff ________________T_____t____________

flags page-count MB symbolic-flags long-symbolic-flags
0x0000000000410000 511 1 ________________T_____t____________ compound_tail,thp
0x000000000040d868 1 0 ___U_lA____Ma_bH______t____________ uptodate,lru,active,mmap,anonymous,swapbacked,compound_head,thp

Signed-off-by: Naoya Horiguchi
Acked-by: Wu Fengguang
Reviewed-by: KAMEZAWA Hiroyuki
Acked-by: KOSAKI Motohiro
Cc: David Rientjes
Cc: Andi Kleen
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2012-03-22 08:54:57 +0800

20 Mar, 2012

1 commit

16c0cfa42 Merge branch 'stable/cleancache.v13' into linux-next ... Browse Code »

* stable/cleancache.v13:
mm: cleancache: Use __read_mostly as appropiate.
mm: cleancache: report statistics via debugfs instead of sysfs.
mm: zcache/tmem/cleancache: s/flush/invalidate/
mm: cleancache: s/flush/invalidate/

Konrad Rzeszutek Wilk
2012-03-20 00:12:19 +0800

07 Mar, 2012

1 commit

40e47125e Documentation: Fix multiple typo in Documentation ... Browse Code »

Signed-off-by: Masanari Iida
Acked-by: Randy Dunlap
Signed-off-by: Jiri Kosina

Masanari Iida
2012-03-07 23:08:24 +0800

10 Feb, 2012

2 commits

d65657c86 mm: Fix typo in cleancache.txt ... Browse Code »

Correct spelling "implementatation" to "implementation" in
Documentation/vm/cleancache.txt

Signed-off-by: Masanari Iida
Signed-off-by: Jiri Kosina

Masanari Iida
2012-02-10 16:52:18 +0800
3cd0b6252 mm: Fix typo in unevictable-lru.txt ... Browse Code »

Correct spelling "semphore" to "semaphore" in
Documentation/vm/unevictable-lru.txt

Signed-off-by: Masanari Iida
Signed-off-by: Jiri Kosina

Masanari Iida
2012-02-10 06:09:53 +0800

24 Jan, 2012

2 commits

417fc2cae mm: cleancache: report statistics via debugfs instead of sysfs. ... Browse Code »

[v9: akpm@linux-foundation.org: sysfs->debugfs; no longer need Doc/ABI file]

Signed-off-by: Dan Magenheimer
Signed-off-by: Konrad Wilk
Cc: Jan Beulich
Acked-by: Seth Jennings
Cc: Jeremy Fitzhardinge
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Nitin Gupta
Cc: Matthew Wilcox
Cc: Chris Mason
Cc: Rik Riel
Cc: Andrew Morton

Dan Magenheimer
2012-01-24 05:07:50 +0800
3167760f8 mm: cleancache: s/flush/invalidate/ ... Browse Code »

Per akpm suggestions alter the use of the term flush to be
invalidate. The next patch will do this across all MM.

This change is completely cosmetic.

[v9: akpm@linux-foundation.org: change "flush" to "invalidate", part 3]

Signed-off-by: Dan Magenheimer
Cc: Kamezawa Hiroyuki
Cc: Jan Beulich
Reviewed-by: Seth Jennings
Cc: Jeremy Fitzhardinge
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Nitin Gupta
Cc: Matthew Wilcox
Cc: Chris Mason
Cc: Rik Riel
Cc: Andrew Morton
[v10: Fixed fs: move code out of buffer.c conflict change]
Signed-off-by: Konrad Rzeszutek Wilk

Dan Magenheimer
2012-01-24 05:06:24 +0800

13 Jan, 2012

1 commit

888a214dc slub: document setting min order with debug_guardpage_minorder > 0 ... Browse Code »

Acked-by: David Rientjes
Cc: Pekka Enberg
Cc: "Rafael J. Wysocki"
Signed-off-by: Stanislaw Gruszka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stanislaw Gruszka
2012-01-13 12:13:04 +0800

28 Nov, 2011

1 commit

25f4379b8 slub: fix slub_max_order Documentation ... Browse Code »

slub_max_order default is 3 (aka PAGE_ALLOC_COSTLY_ORDER), not 1

Acked-by: David Rientjes
Acked-by: Christoph Lameter
Signed-off-by: Eric Dumazet
Signed-off-by: Pekka Enberg

Eric Dumazet
2011-11-28 04:08:28 +0800

26 Oct, 2011

1 commit

e182a345d Merge branches 'slab/next' and 'slub/partial' into slab/for-linus Browse Code »

Pekka Enberg
2011-10-26 23:09:12 +0800

25 Oct, 2011

1 commit

59e525341 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits)
MAINTAINERS: linux-m32r is moderated for non-subscribers
linux@lists.openrisc.net is moderated for non-subscribers
Drop default from "DM365 codec select" choice
parisc: Kconfig: cleanup Kernel page size default
Kconfig: remove redundant CONFIG_ prefix on two symbols
cris: remove arch/cris/arch-v32/lib/nand_init.S
microblaze: add missing CONFIG_ prefixes
h8300: drop puzzling Kconfig dependencies
MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers
tty: drop superfluous dependency in Kconfig
ARM: mxc: fix Kconfig typo 'i.MX51'
Fix file references in Kconfig files
aic7xxx: fix Kconfig references to READMEs
Fix file references in drivers/ide/
thinkpad_acpi: Fix printk typo 'bluestooth'
bcmring: drop commented out line in Kconfig
btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888'
doc: raw1394: Trivial typo fix
CIFS: Don't free volume_info->UNC until we are entirely done with it.
treewide: Correct spelling of successfully in comments
...

Linus Torvalds
2011-10-25 18:11:02 +0800

28 Sep, 2011

1 commit

395cf9691 doc: fix broken references ... Browse Code »

There are numerous broken references to Documentation files (in other
Documentation files, in comments, etc.). These broken references are
caused by typo's in the references, and by renames or removals of the
Documentation files. Some broken references are simply odd.

Fix these broken references, sometimes by dropping the irrelevant text
they were part of.

Signed-off-by: Paul Bolle
Signed-off-by: Jiri Kosina

Paul Bolle
2011-09-28 00:08:04 +0800

23 Sep, 2011

1 commit

e369fde1a thp: fix khugepaged defrag tunable documentation ... Browse Code »

Commit e27e6151b154 ("mm/thp: use conventional format for boolean
attributes") changed

/sys/kernel/mm/transparent_hugepage/khugepaged/defrag

to be tuned by using 1 (enabled) or 0 (disabled) instead of "yes" and
"no", respectively.

Update the documentation.

Signed-off-by: David Rientjes
Signed-off-by: Linus Torvalds

David Rientjes
2011-09-23 05:27:14 +0800

01 Sep, 2011

1 commit

a37933c37 slub: doc: update the slabinfo.c file path ... Browse Code »

slabinfo.c has been moved from Documentaion/vm/ to
tools/slub/ by commit:0d24db337e6d81c0c620ab65cc6947bd6553f742

Update the slub.txt doc to reflect this change too.

Signed-off-by: Jason Liu
Acked-by: Christoph Lameter
Acked-by: David Rientjes
Signed-off-by: Pekka Enberg

Jason Liu
2011-09-01 01:10:17 +0800

16 Jun, 2011

1 commit

f6e07d380 Documentation: update cgroupfs mount point ... Browse Code »

According to commit 676db4af0430 ("cgroupfs: create /sys/fs/cgroup to
mount cgroupfs on") the canonical mountpoint for the cgroup filesystem
is /sys/fs/cgroup. Hence, this should be used in the documentation.

Signed-off-by: Jörg Sommer
Acked-by: Paul Menage
Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Jörg Sommer
2011-06-16 12:52:50 +0800

27 May, 2011

2 commits

f8d613e2a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem:
xen: cleancache shim to Xen Transcendent Memory
ocfs2: add cleancache support
ext4: add cleancache support
btrfs: add cleancache support
ext3: add cleancache support
mm/fs: add hooks to support cleancache
mm: cleancache core ops functions and config
fs: add field to superblock to support cleancache
mm/fs: cleancache documentation

Fix up trivial conflict in fs/btrfs/extent_io.c due to includes

Linus Torvalds
2011-05-27 01:50:56 +0800
4fe4746ab mm/fs: cleancache documentation ... Browse Code »

This patchset introduces cleancache, an optional new feature exposed
by the VFS layer that potentially dramatically increases page cache
effectiveness for many workloads in many environments at a negligible
cost. It does this by providing an interface to transcendent memory,
which is memory/storage that is not otherwise visible to and/or directly
addressable by the kernel.

Instead of being discarded, hooks in the reclaim code "put" clean
pages to cleancache. Filesystems that "opt-in" may "get" pages
from cleancache that were previously put, but pages in cleancache are
"ephemeral", meaning they may disappear at any time. And the size
of cleancache is entirely dynamic and unknowable to the kernel.
Filesystems currently supported by this patchset include ext3, ext4,
btrfs, and ocfs2. Other filesystems (especially those built entirely
on VFS) should be easy to add, but should first be thoroughly tested to
ensure coherency.

Details and a FAQ are provided in Documentation/vm/cleancache.txt

This first patch of eight in this cleancache series only adds two
new documentation files.

[v8: minor documentation changes by author]
[v3: akpm@linux-foundation.org: document sysfs API]
[v3: hch@infradead.org: move detailed description to Documentation/vm]
Signed-off-by: Dan Magenheimer
Reviewed-by: Jeremy Fitzhardinge
Reviewed-by: Konrad Rzeszutek Wilk
Acked-by: Andrew Morton
Acked-by: Randy Dunlap
Cc: Al Viro
Cc: Matthew Wilcox
Cc: Nick Piggin
Cc: Mel Gorman
Cc: Rik Van Riel
Cc: Jan Beulich
Cc: Chris Mason
Cc: Andreas Dilger
Cc: Ted Ts'o
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Nitin Gupta

Dan Magenheimer
2011-05-27 00:00:56 +0800