Eric Lee / smarc-fsl-linux-kernel

07 Jun, 2014

1 commit

33041a0d7 mm: mark remap_file_pages() syscall as deprecated ... Browse Code »

The remap_file_pages() system call is used to create a nonlinear
mapping, that is, a mapping in which the pages of the file are mapped
into a nonsequential order in memory. The advantage of using
remap_file_pages() over using repeated calls to mmap(2) is that the
former approach does not require the kernel to create additional VMA
(Virtual Memory Area) data structures.

Supporting of nonlinear mapping requires significant amount of
non-trivial code in kernel virtual memory subsystem including hot paths.
Also to get nonlinear mapping work kernel need a way to distinguish
normal page table entries from entries with file offset (pte_file).
Kernel reserves flag in PTE for this purpose. PTE flags are scarce
resource especially on some CPU architectures. It would be nice to free
up the flag for other usage.

Fortunately, there are not many users of remap_file_pages() in the wild.
It's only known that one enterprise RDBMS implementation uses the
syscall on 32-bit systems to map files bigger than can linearly fit into
32-bit virtual address space. This use-case is not critical anymore
since 64-bit systems are widely available.

The plan is to deprecate the syscall and replace it with an emulation.
The emulation will create new VMAs instead of nonlinear mappings. It's
going to work slower for rare users of remap_file_pages() but ABI is
preserved.

One side effect of emulation (apart from performance) is that user can
hit vm.max_map_count limit more easily due to additional VMAs. See
comment for DEFAULT_MAX_MAP_COUNT for more details on the limit.

[akpm@linux-foundation.org: fix spello]
Signed-off-by: Kirill A. Shutemov
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Dave Jones
Cc: Armin Rigo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2014-06-07 07:08:17 +0800

05 Jun, 2014

1 commit

3ba08129e mm/memory-failure.c: support use of a dedicated thread to handle SIGBUS(BUS_MCEERR_AO) ... Browse Code »

Currently memory error handler handles action optional errors in the
deferred manner by default. And if a recovery aware application wants
to handle it immediately, it can do it by setting PF_MCE_EARLY flag.
However, such signal can be sent only to the main thread, so it's
problematic if the application wants to have a dedicated thread to
handler such signals.

So this patch adds dedicated thread support to memory error handler. We
have PF_MCE_EARLY flags for each thread separately, so with this patch
AO signal is sent to the thread with PF_MCE_EARLY flag set, not the main
thread. If you want to implement a dedicated thread, you call prctl()
to set PF_MCE_EARLY on the thread.

Memory error handler collects processes to be killed, so this patch lets
it check PF_MCE_EARLY flag on each thread in the collecting routines.

No behavioral change for all non-early kill cases.

Tony said:

: The old behavior was crazy - someone with a multithreaded process might
: well expect that if they call prctl(PF_MCE_EARLY) in just one thread, then
: that thread would see the SIGBUS with si_code = BUS_MCEERR_A0 - even if
: that thread wasn't the main thread for the process.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Naoya Horiguchi
Reviewed-by: Tony Luck
Cc: Kamil Iskra
Cc: Andi Kleen
Cc: Borislav Petkov
Cc: Chen Gong
Cc: [3.2+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2014-06-05 07:54:13 +0800

04 Jun, 2014

1 commit

1aacb90ea Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial into next ... Browse Code »

Pull trivial tree changes from Jiri Kosina:
"Usual pile of patches from trivial tree that make the world go round"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
staging: go7007: remove reference to CONFIG_KMOD
aic7xxx: Remove obsolete preprocessor define
of: dma: doc fixes
doc: fix incorrect formula to calculate CommitLimit value
doc: Note need of bc in the kernel build from 3.10 onwards
mm: Fix printk typo in dmapool.c
modpost: Fix comment typo "Modules.symvers"
Kconfig.debug: Grammar s/addition/additional/
wimax: Spelling s/than/that/, wording s/destinatary/recipient/
aic7xxx: Spelling s/termnation/termination/
arm64: mm: Remove superfluous "the" in comment
of: Spelling s/anonymouns/anonymous/
dma: imx-sdma: Spelling s/determnine/determine/
ath10k: Improve grammar in comments
ath6kl: Spelling s/determnine/determine/
of: Improve grammar for of_alias_get_id() documentation
drm/exynos: Spelling s/contro/control/
radio-bcm2048.c: fix wrong overflow check
doc: printk-formats: do not mention casts for u64/s64
doc: spelling error changes
...

Linus Torvalds
2014-06-04 23:50:34 +0800

05 May, 2014

1 commit

c98be0c96 doc: spelling error changes ... Browse Code »

Fixed multiple spelling errors.

Acked-by: Randy Dunlap
Signed-off-by: Carlos E. Garcia
Signed-off-by: Jiri Kosina

Carlos Garcia
2014-05-05 21:32:05 +0800

19 Apr, 2014

1 commit

8f28ed92d Documentation/vm/numa_memory_policy.txt: fix wrong document in numa_memory_policy.txt ... Browse Code »

In document numa_memory_policy.txt, the following examples for flag
MPOL_F_RELATIVE_NODES are incorrect.

For example, consider a task that is attached to a cpuset with
mems 2-5 that sets an Interleave policy over the same set with
MPOL_F_RELATIVE_NODES. If the cpuset's mems change to 3-7, the
interleave now occurs over nodes 3,5-6. If the cpuset's mems
then change to 0,2-3,5, then the interleave occurs over nodes
0,3,5.

According to the comment of the patch adding flag MPOL_F_RELATIVE_NODES,
the nodemasks the user specifies should be considered relative to the
current task's mems_allowed.

(https://lkml.org/lkml/2008/2/29/428)

And according to numa_memory_policy.txt, if the user's nodemask includes
nodes that are outside the range of the new set of allowed nodes, then
the remap wraps around to the beginning of the nodemask and, if not
already set, sets the node in the mempolicy nodemask.

So in the example, if the user specifies 2-5, for a task whose
mems_allowed is 3-7, the nodemasks should be remapped the third, fourth,
fifth, sixth node in mems_allowed. like the following:

mems_allowed: 3 4 5 6 7

relative index: 0 1 2 3 4
5

So the nodemasks should be remapped to 3,5-7, but not 3,5-6.

And for a task whose mems_allowed is 0,2-3,5, the nodemasks should be
remapped to 0,2-3,5, but not 0,3,5.

mems_allowed: 0 2 3 5

relative index: 0 1 2 3
4 5

Signed-off-by: Tang Chen
Cc: Randy Dunlap
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tang Chen
2014-04-19 07:40:08 +0800

21 Mar, 2014

1 commit

df5cbb278 doc: fix double words ... Browse Code »

Fix double words "the the" in various files
within Documentations.

Signed-off-by: Masanari Iida
Signed-off-by: Jiri Kosina

Masanari Iida
2014-03-21 20:16:58 +0800

11 Feb, 2014

1 commit

3cf8ca1c2 Documentation/: update 00-INDEX files ... Browse Code »

Some of the 00-INDEX files are somewhat outdated and some folders does
not contain 00-INDEX at all. Only outdated (with the notably exception
of spi) indexes are touched here, the 169 folders without 00-INDEX has
not been touched.

New 00-INDEX
- spi/* was added in a series of commits dating back to 2006

Added files (missing in (*/)00-INDEX)
- dmatest.txt was added by commit 851b7e16a07d ("dmatest: run test via
debugfs")
- this_cpu_ops.txt was added by commit a1b2a555d637 ("percpu: add
documentation on this_cpu operations")
- ww-mutex-design.txt was added by commit 040a0a371005 ("mutex: Add
support for wound/wait style locks")
- bcache.txt was added by commit cafe56359144 ("bcache: A block layer
cache")
- kernel-per-CPU-kthreads.txt was added by commit 49717cb40410
("kthread: Document ways of reducing OS jitter due to per-CPU
kthreads")
- phy.txt was added by commit ff764963479a ("drivers: phy: add generic
PHY framework")
- block/null_blk was added by commit 12f8f4fc0314 ("null_blk:
documentation")
- module-signing.txt was added by commit 3cafea307642 ("Add
Documentation/module-signing.txt file")
- assoc_array.txt was added by commit 3cb989501c26 ("Add a generic
associative array implementation.")
- arm/IXP4xx was part of the initial repo
- arm/cluster-pm-race-avoidance.txt was added by commit 7fe31d28e839
("ARM: mcpm: introduce helpers for platform coherency exit/setup")
- arm/firmware.txt was added by commit 7366b92a77fc ("ARM: Add
interface for registering and calling firmware-specific operations")
- arm/kernel_mode_neon.txt was added by commit 2afd0a05241d ("ARM:
7825/1: document the use of NEON in kernel mode")
- arm/tcm.txt was added by commit bc581770cfdd ("ARM: 5580/2: ARM TCM
(Tightly-Coupled Memory) support v3")
- arm/vlocks.txt was added by commit 9762f12d3e05 ("ARM: mcpm: Add
baremetal voting mutexes")
- blackfin/gptimers-example.c, Makefile was added by commit
4b60779d5ea7 ("Blackfin: add an example showing how to use the
gptimers API")
- devicetree/usage-model.txt was added by commit 31134efc681a ("dt:
Linux DT usage model documentation")
- fb/api.txt was added by commit fb21c2f42879 ("fbdev: Add FOURCC-based
format configuration API")
- fb/sm501.txt was added by commit e6a049807105 ("video, sm501: add
edid and commandline support")
- fb/udlfb.txt was added by commit 96f8d864afd6 ("fbdev: move udlfb out
of staging.")
- filesystems/Makefile was added by commit 1e0051ae48a2
("Documentation/fs/: split txt and source files")
- filesystems/nfs/nfsd-admin-interfaces.txt was added by commit
8a4c6e19cfed ("nfsd: document kernel interfaces for nfsd
configuration")
- ide/warm-plug-howto.txt was added by commit f74c91413ec6 ("ide: add
warm-plug support for IDE devices (take 2)")
- laptops/Makefile was added by commit d49129accc21
("Documentation/laptop/: split txt and source files")
- leds/leds-blinkm.txt was added by commit b54cf35a7f65 ("LEDS: add
BlinkM RGB LED driver, documentation and update MAINTAINERS")
- leds/ledtrig-oneshot.txt was added by commit 5e417281cde2 ("leds: add
oneshot trigger")
- leds/ledtrig-transient.txt was added by commit 44e1e9f8e705 ("leds:
add new transient trigger for one shot timer activation")
- m68k/README.buddha was part of the initial repo
- networking/LICENSE.(qla3xxx|qlcnic|qlge) was added by commits
40839129f779, c4e84bde1d59, 5a4faa873782
- networking/Makefile was added by commit 3794f3e812ef ("docsrc: build
Documentation/ sources")
- networking/i40evf.txt was added by commit 105bf2fe6b32 ("i40evf: add
driver to kernel build system")
- networking/ipsec.txt was added by commit b3c6efbc36e2 ("xfrm: Add
file to document IPsec corner case")
- networking/mac80211-auth-assoc-deauth.txt was added by commit
3cd7920a2be8 ("mac80211: add auth/assoc/deauth flow diagram")
- networking/netlink_mmap.txt was added by commit 5683264c3981
("netlink: add documentation for memory mapped I/O")
- networking/nf_conntrack-sysctl.txt was added by commit c9f9e0e1597f
("netfilter: doc: add nf_conntrack sysctl api documentation") lan)
- networking/team.txt was added by commit 3d249d4ca7d0 ("net: introduce
ethernet teaming device")
- networking/vxlan.txt was added by commit d342894c5d2f ("vxlan:
virtual extensible lan")
- power/runtime_pm.txt was added by commit 5e928f77a09a ("PM: Introduce
core framework for run-time PM of I/O devices (rev. 17)")
- power/charger-manager.txt was added by commit 3bb3dbbd56ea
("power_supply: Add initial Charger-Manager driver")
- RCU/lockdep-splat.txt was added by commit d7bd2d68aa2e ("rcu:
Document interpretation of RCU-lockdep splats")
- s390/kvm.txt was added by 5ecee4b (KVM: s390: API documentation)
- s390/qeth.txt was added by commit b4d72c08b358 ("qeth: bridgeport
support - basic control")
- scheduler/sched-bwc.txt was added by commit 88ebc08ea9f7 ("sched: Add
documentation for bandwidth control")
- scsi/advansys.txt was added by commit 4bd6d7f35661 ("[SCSI] advansys:
Move documentation to Documentation/scsi")
- scsi/bfa.txt was added by commit 1ec90174bdb4 ("[SCSI] bfa: add
readme file")
- scsi/bnx2fc.txt was added by commit 12b8fc10eaf4 ("[SCSI] bnx2fc: Add
driver documentation")
- scsi/cxgb3i.txt was added by commit c3673464ebc0 ("[SCSI] cxgb3i: Add
cxgb3i iSCSI driver.")
- scsi/hpsa.txt was added by commit 992ebcf14f3c ("[SCSI] hpsa: Add
hpsa.txt to Documentation/scsi")
- scsi/link_power_management_policy.txt was added by commit
ca77329fb713 ("[libata] Link power management infrastructure")
- scsi/osd.txt was added by commit 78e0c621deca ("[SCSI] osd:
Documentation for OSD library")
- scsi/scsi-parameter.txt was created/moved by commit 163475fb111c
("Documentation: move SCSI parameters to their own text file")
- serial/driver was part of the initial repo
- serial/n_gsm.txt was added by commit 323e84122ec6 ("n_gsm: add a
documentation")
- timers/Makefile was added by commit 3794f3e812ef ("docsrc: build
Documentation/ sources")
- virt/kvm/s390.txt was added by commit d9101fca3d57 ("KVM: s390:
diagnose call documentation")
- vm/split_page_table_lock was added by commit 49076ec2ccaf ("mm:
dynamically allocate page->ptl if it cannot be embedded to struct
page")
- w1/slaves/w1_ds28e04 was added by commit fbf7f7b4e2ae ("w1: Add
1-wire slave device driver for DS28E04-100")
- w1/masters/omap-hdq was added by commit e0a29382c6f5 ("hdq:
documentation for OMAP HDQ")
- x86/early-microcode.txt was added by commit 0d91ea86a895 ("x86, doc:
Documentation for early microcode loading")
- x86/earlyprintk.txt was added by commit a1aade478862 ("x86/doc:
mini-howto for using earlyprintk=dbgp")
- x86/entry_64.txt was added by commit 8b4777a4b50c ("x86-64: Document
some of entry_64.S")
- x86/pat.txt was added by commit d27554d874c7 ("x86: PAT
documentation")

Moved files
- arm/kernel_user_helpers.txt was moved out of arch/arm/kernel by
commit 37b8304642c7 ("ARM: kuser: move interface documentation out of
the source code")
- efi-stub.txt was moved out of x86/ and down into Documentation/ in
commit 4172fe2f8a47 ("EFI stub documentation updates")
- laptops/hpfall.c was moved out of hwmon/ and into laptops/ in commit
efcfed9bad88 ("Move hp_accel to drivers/platform/x86")
- commit 5616c23ad9cd ("x86: doc: move x86-generic documentation from
Doc/x86/i386"):
* x86/usb-legacy-support.txt
* x86/boot.txt
* x86/zero_page.txt
- power/video_extension.txt was moved to acpi in commit 70e66e4df191
("ACPI / video: move video_extension.txt to Documentation/acpi")

Removed files (left in 00-INDEX)
- memory.txt was removed by commit 00ea8990aadf ("memory.txt: remove
stray information")
- gpio.txt was moved to gpio/ in commit fd8e198cfcaa ("Documentation:
gpiolib: document new interface")
- networking/DLINK.txt was removed by commit 168e06ae26dd
("drivers/net: delete old parallel port de600/de620 drivers")
- serial/hayes-esp.txt was removed by commit f53a2ade0bb9 ("tty: esp:
remove broken driver")
- s390/TAPE was removed by commit 9e280f669308 ("[S390] remove tape
block docu")
- vm/locking was removed by commit 57ea8171d2bc ("mm: documentation:
remove hopelessly out-of-date locking doc")
- laptops/acer-wmi.txt was remvoed by commit 020036678e81 ("acer-wmi:
Delete out-of-date documentation")

Typos/misc issues
- rpc-server-gss.txt was added as knfsd-rpcgss.txt in commit
030d794bf498 ("SUNRPC: Use gssproxy upcall for server RPCGSS
authentication.")
- commit b88cf73d9278 ("net: add missing entries to
Documentation/networking/00-INDEX")
* generic-hdlc.txt was added as generic_hdlc.txt
* spider_net.txt was added as spider-net.txt
- w1/master/mxc-w1 was added as mxc_w1 by commit a5fd9139f74c ("w1: add
1-wire master driver for i.MX27 / i.MX31")
- s390/zfcpdump.txt was added as zfcpdump by commit 6920c12a407e
("[S390] Add Documentation/s390/00-INDEX.")

Signed-off-by: Henrik Austad
Reviewed-by: Paul E. McKenney [rcu bits]
Acked-by: Rob Landley
Cc: Jiri Kosina
Cc: Thomas Gleixner
Cc: Rob Herring
Cc: David S. Miller
Cc: Mark Brown
Cc: "H. Peter Anvin"
Cc: Ingo Molnar
Cc: Gleb Natapov
Cc: Linus Torvalds
Cc: Len Brown
Cc: James Bottomley
Cc: Jean-Christophe Plagniol-Villard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Henrik Austad
2014-02-11 08:01:40 +0800

24 Jan, 2014

1 commit

57ea8171d mm: documentation: remove hopelessly out-of-date locking doc ... Browse Code »

Documentation/vm/locking is a blast from the past. In the entire git
history, it has had precisely Three modifications. Two of those look to
be pure renames, and the third was from 2005.

The doc contains such gems as:

> The page_table_lock is grabbed while holding the
> kernel_lock spinning monitor.

> Page stealers hold kernel_lock to protect against a bunch of
> races.

Or this which talks about mmap_sem:

> 4. The exception to this rule is expand_stack, which just
> takes the read lock and the page_table_lock, this is ok
> because it doesn't really modify fields anybody relies on.

expand_stack() doesn't take any locks any more directly, and the
mmap_sem acquisition was long ago moved up in to the page fault code
itself.

It could be argued that we need to rewrite this, but it is dangerous to
leave it as-is. It will confuse more people than it helps.

Signed-off-by: Dave Hansen
Cc: Hugh Dickins
Acked-by: Vlastimil Babka
Cc: Wanpeng Li
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Hansen
2014-01-24 08:36:50 +0800

22 Jan, 2014

1 commit

49f0ce5f9 mm: add overcommit_kbytes sysctl variable ... Browse Code »

Some applications that run on HPC clusters are designed around the
availability of RAM and the overcommit ratio is fine tuned to get the
maximum usage of memory without swapping. With growing memory, the
1%-of-all-RAM grain provided by overcommit_ratio has become too coarse
for these workload (on a 2TB machine it represents no less than 20GB).

This patch adds the new overcommit_kbytes sysctl variable that allow a
much finer grain.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix nommu build]
Signed-off-by: Jerome Marchand
Cc: Dave Hansen
Cc: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jerome Marchand
2014-01-22 08:19:44 +0800

22 Nov, 2013

1 commit

c283610e4 x86, mm: do not leak page->ptl for pmd page tables ... Browse Code »

There are two code paths how page with pmd page table can be freed:
pmd_free() and pmd_free_tlb().

I've missed the second one and didn't add page table destructor call
there. It leads to leak of page->ptl for pmd page tables, if
dynamically allocated page->ptl is in use.

The patch adds the missed destructor and modifies documentation
accordingly.

Signed-off-by: Kirill A. Shutemov
Reported-by: Andrey Vagin
Tested-by: Andrey Vagin
Cc: Ingo Molnar
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2013-11-22 08:42:28 +0800

16 Nov, 2013

1 commit

9073e1a80 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull trivial tree updates from Jiri Kosina:
"Usual earth-shaking, news-breaking, rocket science pile from
trivial.git"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
doc: add missing files to timers/00-INDEX
timekeeping: Fix some trivial typos in comments
mm: Fix some trivial typos in comments
irq: Fix some trivial typos in comments
NUMA: fix typos in Kconfig help text
mm: update 00-INDEX
doc: Documentation/DMA-attributes.txt fix typo
DRM: comment: `halve' -> `half'
Docs: Kconfig: `devlopers' -> `developers'
doc: typo on word accounting in kprobes.c in mutliple architectures
treewide: fix "usefull" typo
treewide: fix "distingush" typo
mm/Kconfig: Grammar s/an/a/
kexec: Typo s/the/then/
Documentation/kvm: Update cpuid documentation for steal time and pv eoi
treewide: Fix common typo in "identify"
__page_to_pfn: Fix typo in comment
Correct some typos for word frequency
clk: fixed-factor: Fix a trivial typo
...

Linus Torvalds
2013-11-16 08:47:22 +0800

15 Nov, 2013

1 commit

49076ec2c mm: dynamically allocate page->ptl if it cannot be embedded to struct page ... Browse Code »

If split page table lock is in use, we embed the lock into struct page
of table's page. We have to disable split lock, if spinlock_t is too
big be to be embedded, like when DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC
enabled.

This patch add support for dynamic allocation of split page table lock
if we can't embed it to struct page.

page->ptl is unsigned long now and we use it as spinlock_t if
sizeof(spinlock_t) ptl.

Signed-off-by: Kirill A. Shutemov
Reviewed-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2013-11-15 08:32:20 +0800

13 Nov, 2013

1 commit

0151e3d6d Documentation/vm/zswap.txt: fix typos ... Browse Code »

Signed-off-by: Christian Hesse
Acked-by: Seth Jennings
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christian Hesse
2013-11-13 11:09:05 +0800

14 Oct, 2013

1 commit

6c7842e0a mm: update 00-INDEX ... Browse Code »

The following files moved files out of Documentation/vm/
c6dd897f ("mm: move page-types.c from Documentation to tools/vm")
f0f57b2b ("move hugepage test examples to tools/testing/selftests/vm)

Remove these files from vm/00-INDEX.

The following commits added new files do Documentation/vm/
4fe4746a ("mm/fs: cleancache documentation") added vm/cleancache.txt
d65bfacb ("mm: highmem documentation") added vm/highmem.txt
1c9bf22c ("thp: transparent hugepage support documentation") added
vm/transhuge.txt
0f8975ec ("mm: soft-dirty bits for user memory changes tracking")
61b0d760 ("zswap: add documentation")
27c6aec2 ("mm: frontswap: config and doc files")

Add the missing documentation-files with a short description to 00-INDEX

Signed-off-by: Henrik Austad
Signed-off-by: Jiri Kosina

Henrik Austad
2013-10-14 21:52:20 +0800

12 Sep, 2013

2 commits

d9104d1ca mm: track vma changes with VM_SOFTDIRTY bit ... Browse Code »

Pavel reported that in case if vma area get unmapped and then mapped (or
expanded) in-place, the soft dirty tracker won't be able to recognize this
situation since it works on pte level and ptes are get zapped on unmap,
loosing soft dirty bit of course.

So to resolve this situation we need to track actions on vma level, there
VM_SOFTDIRTY flag comes in. When new vma area created (or old expanded)
we set this bit, and keep it here until application calls for clearing
soft dirty bit.

Thus when user space application track memory changes now it can detect if
vma area is renewed.

Reported-by: Pavel Emelyanov
Signed-off-by: Cyrill Gorcunov
Cc: Andy Lutomirski
Cc: Matt Mackall
Cc: Xiao Guangrong
Cc: Marcelo Tosatti
Cc: KOSAKI Motohiro
Cc: Stephen Rothwell
Cc: Peter Zijlstra
Cc: "Aneesh Kumar K.V"
Cc: Rob Landley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cyrill Gorcunov
2013-09-12 06:57:56 +0800
15610c86f hugepage: mention libhugetlbfs in doc ... Browse Code »

Explicitly mention/recommend using the libhugetlbfs test cases when
changing related kernel code. Developers that are unaware of the project
can easily miss this and introduce potential regressions that may or may
not be caught by community review.

Also do some cleanups that make the document visually easier to view at a
first glance.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2013-09-12 06:57:39 +0800

11 Jul, 2013

1 commit

61b0d7601 zswap: add documentation ... Browse Code »

Add the documentation file for the zswap functionality

Signed-off-by: Seth Jennings
Acked-by: Rik van Riel
Cc: Greg Kroah-Hartman
Cc: Nitin Gupta
Cc: Minchan Kim
Cc: Konrad Rzeszutek Wilk
Cc: Dan Magenheimer
Cc: Robert Jennings
Cc: Jenifer Hopper
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: Larry Woodman
Cc: Benjamin Herrenschmidt
Cc: Dave Hansen
Cc: Joe Perches
Cc: Joonsoo Kim
Cc: Cody P Schafer
Cc: Hugh Dickens
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Seth Jennings
2013-07-11 09:11:34 +0800

10 Jul, 2013

1 commit

f49cbdde4 mm/thp: fix doc for transparent huge zero page ... Browse Code »

Transparent huge zero page is used during the page fault instead of in
khugepaged.

# ls /sys/kernel/mm/transparent_hugepage/
defrag enabled khugepaged use_zero_page
# ls /sys/kernel/mm/transparent_hugepage/khugepaged/
alloc_sleep_millisecs defrag full_scans max_ptes_none pages_collapsed pages_to_scan scan_sleep_millisecs

This patch corrects the documentation just like the codes done.

Signed-off-by: Wanpeng Li
Acked-by: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wanpeng Li
2013-07-10 01:33:23 +0800

05 Jul, 2013

1 commit

80cc38b16 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull trivial tree updates from Jiri Kosina:
"The usual stuff from trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
treewide: relase -> release
Documentation/cgroups/memory.txt: fix stat file documentation
sysctl/net.txt: delete reference to obsolete 2.4.x kernel
spinlock_api_smp.h: fix preprocessor comments
treewide: Fix typo in printk
doc: device tree: clarify stuff in usage-model.txt.
open firmware: "/aliasas" -> "/aliases"
md: bcache: Fixed a typo with the word 'arithmetic'
irq/generic-chip: fix a few kernel-doc entries
frv: Convert use of typedef ctl_table to struct ctl_table
sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
doc: clk: Fix incorrect wording
Documentation/arm/IXP4xx fix a typo
Documentation/networking/ieee802154 fix a typo
Documentation/DocBook/media/v4l fix a typo
Documentation/video4linux/si476x.txt fix a typo
Documentation/virtual/kvm/api.txt fix a typo
Documentation/early-userspace/README fix a typo
Documentation/video4linux/soc-camera.txt fix a typo
lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
...

Linus Torvalds
2013-07-05 02:40:58 +0800

04 Jul, 2013

2 commits

541c237c0 pagemap: prepare to reuse constant bits with page-shift ... Browse Code »

In order to reuse bits from pagemap entries gracefully, we leave the
entries as is but on pagemap open emit a warning in dmesg, that bits
55-60 are about to change in a couple of releases. Next, if a user
issues soft-dirty clear command via the clear_refs file (it was disabled
before v3.9) we assume that he's aware of the new pagemap format, note
that fact and report the bits in pagemap in the new manner.

The "migration strategy" looks like this then:

1. existing users are not affected -- they don't touch soft-dirty feature, thus
see old bits in pagemap, but are warned and have time to fix themselves
2. those who use soft-dirty know about new pagemap format
3. some time soon we get rid of any signs of page-shift in pagemap as well as
this trick with clear-soft-dirty affecting pagemap format.

Signed-off-by: Pavel Emelyanov
Cc: Matt Mackall
Cc: Xiao Guangrong
Cc: Glauber Costa
Cc: Marcelo Tosatti
Cc: KOSAKI Motohiro
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2013-07-04 07:07:26 +0800
0f8975ec4 mm: soft-dirty bits for user memory changes tracking ... Browse Code »

The soft-dirty is a bit on a PTE which helps to track which pages a task
writes to. In order to do this tracking one should

1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs)
2. Wait some time.
3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries)

To do this tracking, the writable bit is cleared from PTEs when the
soft-dirty bit is. Thus, after this, when the task tries to modify a
page at some virtual address the #PF occurs and the kernel sets the
soft-dirty bit on the respective PTE.

Note, that although all the task's address space is marked as r/o after
the soft-dirty bits clear, the #PF-s that occur after that are processed
fast. This is so, since the pages are still mapped to physical memory,
and thus all the kernel does is finds this fact out and puts back
writable, dirty and soft-dirty bits on the PTE.

Another thing to note, is that when mremap moves PTEs they are marked
with soft-dirty as well, since from the user perspective mremap modifies
the virtual memory at mremap's new address.

Signed-off-by: Pavel Emelyanov
Cc: Matt Mackall
Cc: Xiao Guangrong
Cc: Glauber Costa
Cc: Marcelo Tosatti
Cc: KOSAKI Motohiro
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2013-07-04 07:07:26 +0800

28 May, 2013

1 commit

f884ab15a doc: fix misspellings with 'codespell' tool ... Browse Code »

Signed-off-by: Anatol Pomozov
Signed-off-by: Jiri Kosina

Anatol Pomozov
2013-05-28 18:02:12 +0800

30 Apr, 2013

1 commit

c9b1d0981 mm: limit growth of 3% hardcoded other user reserve ... Browse Code »

Add user_reserve_kbytes knob.

Limit the growth of the memory reserved for other user processes to
min(3% current process size, user_reserve_pages). Only about 8MB is
necessary to enable recovery in the default mode, and only a few hundred
MB are required even when overcommit is disabled.

user_reserve_pages defaults to min(3% free pages, 128MB)

I arrived at 128MB by taking the max VSZ of sshd, login, bash, and top ...
then adding the RSS of each.

This only affects OVERCOMMIT_NEVER mode.

Background

1. user reserve

__vm_enough_memory reserves a hardcoded 3% of the current process size for
other applications when overcommit is disabled. This was done so that a
user could recover if they launched a memory hogging process. Without the
reserve, a user would easily run into a message such as:

bash: fork: Cannot allocate memory

2. admin reserve

Additionally, a hardcoded 3% of free memory is reserved for root in both
overcommit 'guess' and 'never' modes. This was intended to prevent a
scenario where root-cant-log-in and perform recovery operations.

Note that this reserve shrinks, and doesn't guarantee a useful reserve.

Motivation

The two hardcoded memory reserves should be updated to account for current
memory sizes.

Also, the admin reserve would be more useful if it didn't shrink too much.

When the current code was originally written, 1GB was considered
"enterprise". Now the 3% reserve can grow to multiple GB on large memory
systems, and it only needs to be a few hundred MB at most to enable a user
or admin to recover a system with an unwanted memory hogging process.

I've found that reducing these reserves is especially beneficial for a
specific type of application load:

* single application system
* one or few processes (e.g. one per core)
* allocating all available memory
* not initializing every page immediately
* long running

I've run scientific clusters with this sort of load. A long running job
sometimes failed many hours (weeks of CPU time) into a calculation. They
weren't initializing all of their memory immediately, and they weren't
using calloc, so I put systems into overcommit 'never' mode. These
clusters run diskless and have no swap.

However, with the current reserves, a user wishing to allocate as much
memory as possible to one process may be prevented from using, for
example, almost 2GB out of 32GB.

The effect is less, but still significant when a user starts a job with
one process per core. I have repeatedly seen a set of processes
requesting the same amount of memory fail because one of them could not
allocate the amount of memory a user would expect to be able to allocate.
For example, Message Passing Interfce (MPI) processes, one per core. And
it is similar for other parallel programming frameworks.

Changing this reserve code will make the overcommit never mode more useful
by allowing applications to allocate nearly all of the available memory.

Also, the new admin_reserve_kbytes will be safer than the current behavior
since the hardcoded 3% of available memory reserve can shrink to something
useless in the case where applications have grabbed all available memory.

Risks

* "bash: fork: Cannot allocate memory"

The downside of the first patch-- which creates a tunable user reserve
that is only used in overcommit 'never' mode--is that an admin can set
it so low that a user may not be able to kill their process, even if
they already have a shell prompt.

Of course, a user can get in the same predicament with the current 3%
reserve--they just have to launch processes until 3% becomes negligible.

* root-cant-log-in problem

The second patch, adding the tunable rootuser_reserve_pages, allows
the admin to shoot themselves in the foot by setting it too small. They
can easily get the system into a state where root-can't-log-in.

However, the new admin_reserve_kbytes will be safer than the current
behavior since the hardcoded 3% of available memory reserve can shrink
to something useless in the case where applications have grabbed all
available memory.

Alternatives

* Memory cgroups provide a more flexible way to limit application memory.

Not everyone wants to set up cgroups or deal with their overhead.

* We could create a fourth overcommit mode which provides smaller reserves.

The size of useful reserves may be drastically different depending
on the whether the system is embedded or enterprise.

* Force users to initialize all of their memory or use calloc.

Some users don't want/expect the system to overcommit when they malloc.
Overcommit 'never' mode is for this scenario, and it should work well.

The new user and admin reserve tunables are simple to use, with low
overhead compared to cgroups. The patches preserve current behavior where
3% of memory is less than 128MB, except that the admin reserve doesn't
shrink to an unusable size under pressure. The code allows admins to tune
for embedded and enterprise usage.

FAQ

* How is the root-cant-login problem addressed?
What happens if admin_reserve_pages is set to 0?

Root is free to shoot themselves in the foot by setting
admin_reserve_kbytes too low.

On x86_64, the minimum useful reserve is:
8MB for overcommit 'guess'
128MB for overcommit 'never'

admin_reserve_pages defaults to min(3% free memory, 8MB)

So, anyone switching to 'never' mode needs to adjust
admin_reserve_pages.

* How do you calculate a minimum useful reserve?

A user or the admin needs enough memory to login and perform
recovery operations, which includes, at a minimum:

sshd or login + bash (or some other shell) + top (or ps, kill, etc.)

For overcommit 'guess', we can sum resident set sizes (RSS)
because we only need enough memory to handle what the recovery
programs will typically use. On x86_64 this is about 8MB.

For overcommit 'never', we can take the max of their virtual sizes (VSZ)
and add the sum of their RSS. We use VSZ instead of RSS because mode
forces us to ensure we can fulfill all of the requested memory allocations--
even if the programs only use a fraction of what they ask for.
On x86_64 this is about 128MB.

When swap is enabled, reserves are useful even when they are as
small as 10MB, regardless of overcommit mode.

When both swap and overcommit are disabled, then the admin should
tune the reserves higher to be absolutley safe. Over 230MB each
was safest in my testing.

* What happens if user_reserve_pages is set to 0?

Note, this only affects overcomitt 'never' mode.

Then a user will be able to allocate all available memory minus
admin_reserve_kbytes.

However, they will easily see a message such as:

"bash: fork: Cannot allocate memory"

And they won't be able to recover/kill their application.
The admin should be able to recover the system if
admin_reserve_kbytes is set appropriately.

* What's the difference between overcommit 'guess' and 'never'?

"Guess" allows an allocation if there are enough free + reclaimable
pages. It has a hardcoded 3% of free pages reserved for root.

"Never" allows an allocation if there is enough swap + a configurable
percentage (default is 50) of physical RAM. It has a hardcoded 3% of
free pages reserved for root, like "Guess" mode. It also has a
hardcoded 3% of the current process size reserved for additional
applications.

* Why is overcommit 'guess' not suitable even when an app eventually
writes to every page? It takes free pages, file pages, available
swap pages, reclaimable slab pages into consideration. In other words,
these are all pages available, then why isn't overcommit suitable?

Because it only looks at the present state of the system. It
does not take into account the memory that other applications have
malloced, but haven't initialized yet. It overcommits the system.

Test Summary

There was little change in behavior in the default overcommit 'guess'
mode with swap enabled before and after the patch. This was expected.

Systems run most predictably (i.e. no oom kills) in overcommit 'never'
mode with swap enabled. This also allowed the most memory to be allocated
to a user application.

Overcommit 'guess' mode without swap is a bad idea. It is easy to
crash the system. None of the other tested combinations crashed.
This matches my experience on the Roadrunner supercomputer.

Without the tunable user reserve, a system in overcommit 'never' mode
and without swap does not allow the admin to recover, although the
admin can.

With the new tunable reserves, a system in overcommit 'never' mode
and without swap can be configured to:

1. maximize user-allocatable memory, running close to the edge of
recoverability

2. maximize recoverability, sacrificing allocatable memory to
ensure that a user cannot take down a system

Test Description

Fedora 18 VM - 4 x86_64 cores, 5725MB RAM, 4GB Swap

System is booted into multiuser console mode, with unnecessary services
turned off. Caches were dropped before each test.

Hogs are user memtester processes that attempt to allocate all free memory
as reported by /proc/meminfo

In overcommit 'never' mode, memory_ratio=100

Test Results

3.9.0-rc1-mm1

Overcommit | Swap | Hogs | MB Got/Wanted | OOMs | User Recovery | Admin Recovery
---------- ---- ---- ------------- ---- ------------- --------------
guess yes 1 5432/5432 no yes yes
guess yes 4 5444/5444 1 yes yes
guess no 1 5302/5449 no yes yes
guess no 4 - crash no no

never yes 1 5460/5460 1 yes yes
never yes 4 5460/5460 1 yes yes
never no 1 5218/5432 no no yes
never no 4 5203/5448 no no yes

3.9.0-rc1-mm1-tunablereserves

User and Admin Recovery show their respective reserves, if applicable.

Overcommit | Swap | Hogs | MB Got/Wanted | OOMs | User Recovery | Admin Recovery
---------- ---- ---- ------------- ---- ------------- --------------
guess yes 1 5419/5419 no - yes 8MB yes
guess yes 4 5436/5436 1 - yes 8MB yes
guess no 1 5440/5440 * - yes 8MB yes
guess no 4 - crash - no 8MB no

* process would successfully mlock, then the oom killer would pick it

never yes 1 5446/5446 no 10MB yes 20MB yes
never yes 4 5456/5456 no 10MB yes 20MB yes
never no 1 5387/5429 no 128MB no 8MB barely
never no 1 5323/5428 no 226MB barely 8MB barely
never no 1 5323/5428 no 226MB barely 8MB barely

never no 1 5359/5448 no 10MB no 10MB barely

never no 1 5323/5428 no 0MB no 10MB barely
never no 1 5332/5428 no 0MB no 50MB yes
never no 1 5293/5429 no 0MB no 90MB yes

never no 1 5001/5427 no 230MB yes 338MB yes
never no 4* 4998/5424 no 230MB yes 338MB yes

* more memtesters were launched, able to allocate approximately another 100MB

Future Work

- Test larger memory systems.

- Test an embedded image.

- Test other architectures.

- Time malloc microbenchmarks.

- Would it be useful to be able to set overcommit policy for
each memory cgroup?

- Some lines are slightly above 80 chars.
Perhaps define a macro to convert between pages and kb?
Other places in the kernel do this.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: make init_user_reserve() static]
Signed-off-by: Andrew Shewmaker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Shewmaker
2013-04-30 06:54:36 +0800

24 Feb, 2013

2 commits

8fdb3dbf0 ksm: add some comments ... Browse Code »

Added slightly more detail to the Documentation of merge_across_nodes, a
few comments in areas indicated by review, and renamed get_ksm_page()'s
argument from "locked" to "lock_it". No functional change.

Signed-off-by: Hugh Dickins
Cc: Mel Gorman
Cc: Petr Holasek
Cc: Andrea Arcangeli
Cc: Izik Eidus
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2013-02-24 09:50:23 +0800
90bd6fd31 ksm: allow trees per NUMA node ... Browse Code »

Here's a KSM series, based on mmotm 2013-01-23-17-04: starting with
Petr's v7 "KSM: numa awareness sysfs knob"; then fixing the two issues
we had with that, fully enabling KSM page migration on the way.

(A different kind of KSM/NUMA issue which I've certainly not begun to
address here: when KSM pages are unmerged, there's usually no sense in
preferring to allocate the new pages local to the caller's node.)

This patch:

Introduces new sysfs boolean knob /sys/kernel/mm/ksm/merge_across_nodes
which control merging pages across different numa nodes. When it is set
to zero only pages from the same node are merged, otherwise pages from
all nodes can be merged together (default behavior).

Typical use-case could be a lot of KVM guests on NUMA machine and cpus
from more distant nodes would have significant increase of access
latency to the merged ksm page. Sysfs knob was choosen for higher
variability when some users still prefers higher amount of saved
physical memory regardless of access latency.

Every numa node has its own stable & unstable trees because of faster
searching and inserting. Changing of merge_across_nodes value is
possible only when there are not any ksm shared pages in system.

I've tested this patch on numa machines with 2, 4 and 8 nodes and
measured speed of memory access inside of KVM guests with memory pinned
to one of nodes with this benchmark:

http://pholasek.fedorapeople.org/alloc_pg.c

Population standard deviations of access times in percentage of average
were following:

merge_across_nodes=1
2 nodes 1.4%
4 nodes 1.6%
8 nodes 1.7%

merge_across_nodes=0
2 nodes 1%
4 nodes 0.32%
8 nodes 0.018%

RFC: https://lkml.org/lkml/2011/11/30/91
v1: https://lkml.org/lkml/2012/1/23/46
v2: https://lkml.org/lkml/2012/6/29/105
v3: https://lkml.org/lkml/2012/9/14/550
v4: https://lkml.org/lkml/2012/9/23/137
v5: https://lkml.org/lkml/2012/12/10/540
v6: https://lkml.org/lkml/2012/12/23/154
v7: https://lkml.org/lkml/2012/12/27/225

Hugh notes that this patch brings two problems, whose solution needs
further support in mm/ksm.c, which follows in subsequent patches:

1) switching merge_across_nodes after running KSM is liable to oops
on stale nodes still left over from the previous stable tree;

2) memory hotremove may migrate KSM pages, but there is no provision
here for !merge_across_nodes to migrate nodes to the proper tree.

Signed-off-by: Petr Holasek
Signed-off-by: Hugh Dickins
Acked-by: Rik van Riel
Cc: Andrea Arcangeli
Cc: Izik Eidus
Cc: Gerald Schaefer
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Holasek
2013-02-24 09:50:19 +0800

14 Dec, 2012

1 commit

f6e858a00 Merge branch 'akpm' (Andrew's patch-bomb) ... Browse Code »

Merge misc VM changes from Andrew Morton:
"The rest of most-of-MM. The other MM bits await a slab merge.

This patch includes the addition of a huge zero_page. Not a
performance boost but it an save large amounts of physical memory in
some situations.

Also a bunch of Fujitsu engineers are working on memory hotplug.
Which, as it turns out, was badly broken. About half of their patches
are included here; the remainder are 3.8 material."

However, this merge disables CONFIG_MOVABLE_NODE, which was totally
broken. We don't add new features with "default y", nor do we add
Kconfig questions that are incomprehensible to most people without any
help text. Does the feature even make sense without compaction or
memory hotplug?

* akpm: (54 commits)
mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic()
mm/memory.c: remove unused code from do_wp_page()
asm-generic, mm: pgtable: consolidate zero page helpers
mm/hugetlb.c: fix warning on freeing hwpoisoned hugepage
hwpoison, hugetlbfs: fix RSS-counter warning
hwpoison, hugetlbfs: fix "bad pmd" warning in unmapping hwpoisoned hugepage
mm: protect against concurrent vma expansion
memcg: do not check for mm in __mem_cgroup_count_vm_event
tmpfs: support SEEK_DATA and SEEK_HOLE (reprise)
mm: provide more accurate estimation of pages occupied by memmap
fs/buffer.c: remove redundant initialization in alloc_page_buffers()
fs/buffer.c: do not inline exported function
writeback: fix a typo in comment
mm: introduce new field "managed_pages" to struct zone
mm, oom: remove statically defined arch functions of same name
mm, oom: remove redundant sleep in pagefault oom handler
mm, oom: cleanup pagefault oom handler
memory_hotplug: allow online/offline memory to result movable node
numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
mm, memcg: avoid unnecessary function call when memcg is disabled
...

Linus Torvalds
2012-12-14 05:11:15 +0800

13 Dec, 2012

3 commits

79da5407e thp: introduce sysfs knob to disable huge zero page ... Browse Code »

By default kernel tries to use huge zero page on read page fault. It's
possible to disable huge zero page by writing 0 or enable it back by
writing 1:

echo 0 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page
echo 1 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page

Signed-off-by: Kirill A. Shutemov
Cc: Andrea Arcangeli
Cc: Andi Kleen
Cc: "H. Peter Anvin"
Cc: Mel Gorman
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2012-12-13 09:38:32 +0800
d8a8e1f0d thp, vmstat: implement HZP_ALLOC and HZP_ALLOC_FAILED events ... Browse Code »

hzp_alloc is incremented every time a huge zero page is successfully
allocated. It includes allocations which where dropped due
race with other allocation. Note, it doesn't count every map
of the huge zero page, only its allocation.

hzp_alloc_failed is incremented if kernel fails to allocate huge zero
page and falls back to using small pages.

Signed-off-by: Kirill A. Shutemov
Cc: Andrea Arcangeli
Cc: Andi Kleen
Cc: "H. Peter Anvin"
Cc: Mel Gorman
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2012-12-13 09:38:32 +0800
e180377f1 thp: change split_huge_page_pmd() interface ... Browse Code »

Pass vma instead of mm and add address parameter.

In most cases we already have vma on the stack. We provides
split_huge_page_pmd_mm() for few cases when we have mm, but not vma.

This change is preparation to huge zero pmd splitting implementation.

Signed-off-by: Kirill A. Shutemov
Cc: Andrea Arcangeli
Cc: Andi Kleen
Cc: "H. Peter Anvin"
Cc: Mel Gorman
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2012-12-13 09:38:31 +0800

19 Nov, 2012

1 commit

4e79162a5 doc: fix quite a few typos within Documentation ... Browse Code »

Correct spelling typo in Documentations

Signed-off-by: Jiri Kosina

Masanari Iida
2012-11-19 21:28:24 +0800

09 Oct, 2012

2 commits

39b5f29ac mm: remove vma arg from page_evictable ... Browse Code »

page_evictable(page, vma) is an irritant: almost all its callers pass
NULL for vma. Remove the vma arg and use mlocked_vma_newpage(vma, page)
explicitly in the couple of places it's needed. But in those places we
don't even need page_evictable() itself! They're dealing with a freshly
allocated anonymous page, which has no "mapping" and cannot be mlocked yet.

Signed-off-by: Hugh Dickins
Acked-by: Mel Gorman
Cc: Rik van Riel
Acked-by: Johannes Weiner
Cc: Michel Lespinasse
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-10-09 15:22:55 +0800
314e51b98 mm: kill vma flag VM_RESERVED and mm->reserved_vm counter ... Browse Code »

A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:

| effect | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP

This patch removes reserved_vm counter from mm_struct. Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.

Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.

remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.

[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov
Cc: Alexander Viro
Cc: Carsten Otte
Cc: Chris Metcalf
Cc: Cyrill Gorcunov
Cc: Eric Paris
Cc: H. Peter Anvin
Cc: Hugh Dickins
Cc: Ingo Molnar
Cc: James Morris
Cc: Jason Baron
Cc: Kentaro Takeda
Cc: Matt Helsley
Cc: Nick Piggin
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Robert Richter
Cc: Suresh Siddha
Cc: Tetsuo Handa
Cc: Venkatesh Pallipadi
Acked-by: Linus Torvalds
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2012-10-09 15:22:19 +0800

22 Aug, 2012

1 commit

d46f3d86f hugetlb: update hugetlbpage.txt ... Browse Code »

Commit f0f57b2b1488 ("mm: move hugepage test examples to
tools/testing/selftests/vm") moved map_hugetlb.c, hugepage-shm.c and
hugepage-mmap.c tests into tools/testing/selftests/vm/ directory, but it
didn't update hugetlbpage.txt

Signed-off-by: Zhouping Liu
Acked-by: Dave Young
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhouping Liu
2012-08-22 07:45:03 +0800

23 Jul, 2012

1 commit

1d00015e2 mm/frontswap: cleanup doc and comment error ... Browse Code »

Signed-off-by: Wanpeng Li
Signed-off-by: Konrad Rzeszutek Wilk

Wanpeng Li
2012-07-23 23:16:20 +0800

05 Jun, 2012

1 commit

a3fe778c7 Merge tag 'stable/frontswap.v16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm ... Browse Code »

Pull frontswap feature from Konrad Rzeszutek Wilk:
"Frontswap provides a "transcendent memory" interface for swap pages.
In some environments, dramatic performance savings may be obtained
because swapped pages are saved in RAM (or a RAM-like device) instead
of a swap disk. This tag provides the basic infrastructure along with
some changes to the existing backends."

Fix up trivial conflict in mm/Makefile due to removal of swap token code
changing a line next to the new frontswap entry.

This pull request came in before the merge window even opened, it got
delayed to after the merge window by me just wanting to make sure it had
actual users. Apparently IBM is using this on their embedded side, and
Jan Beulich says that it's already made available for SLES and OpenSUSE
users.

Also acked by Rik van Riel, and Konrad points to other people liking it
too. So in it goes.

By Dan Magenheimer (4) and Konrad Rzeszutek Wilk (2)
via Konrad Rzeszutek Wilk
* tag 'stable/frontswap.v16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm:
frontswap: s/put_page/store/g s/get_page/load
MAINTAINER: Add myself for the frontswap API
mm: frontswap: config and doc files
mm: frontswap: core frontswap functionality
mm: frontswap: core swap subsystem hooks and headers
mm: frontswap: add frontswap header file

Linus Torvalds
2012-06-05 03:28:45 +0800

02 Jun, 2012

1 commit

af4f8ba31 Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux ... Browse Code »

Pull slab updates from Pekka Enberg:
"Mainly a bunch of SLUB fixes from Joonsoo Kim"

* 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
slub: use __SetPageSlab function to set PG_slab flag
slub: fix a memory leak in get_partial_node()
slub: remove unused argument of init_kmem_cache_node()
slub: fix a possible memory leak
Documentations: Fix slabinfo.c directory in vm/slub.txt
slub: fix incorrect return type of get_any_partial()

Linus Torvalds
2012-06-02 07:50:23 +0800

01 Jun, 2012

1 commit

052fb0d63 proc: report file/anon bit in /proc/pid/pagemap ... Browse Code »

This is an implementation of Andrew's proposal to extend the pagemap file
bits to report what is missing about tasks' working set.

The problem with the working set detection is multilateral. In the criu
(checkpoint/restore) project we dump the tasks' memory into image files
and to do it properly we need to detect which pages inside mappings are
really in use. The mincore syscall I though could help with this did not.
First, it doesn't report swapped pages, thus we cannot find out which
parts of anonymous mappings to dump. Next, it does report pages from page
cache as present even if they are not mapped, and it doesn't make that has
not been cow-ed.

Note, that issue with swap pages is critical -- we must dump swap pages to
image file. But the issues with file pages are optimization -- we can
take all file pages to image, this would be correct, but if we know that a
page is not mapped or not cow-ed, we can remove them from dump file. The
dump would still be self-consistent, though significantly smaller in size
(up to 10 times smaller on real apps).

Andrew noticed, that the proc pagemap file solved 2 of 3 above issues --
it reports whether a page is present or swapped and it doesn't report not
mapped page cache pages. But, it doesn't distinguish cow-ed file pages
from not cow-ed.

I would like to make the last unused bit in this file to report whether the
page mapped into respective pte is PageAnon or not.

[comment stolen from Pavel Emelyanov's v1 patch]

Signed-off-by: Konstantin Khlebnikov
Cc: Pavel Emelyanov
Cc: Matt Mackall
Cc: Hugh Dickins
Cc: Rik van Riel
Acked-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2012-06-01 08:49:29 +0800

30 May, 2012

1 commit

692569946 mm: document the meminfo and vmstat fields of relevance to transparent hugepages ... Browse Code »

Update Documentation/vm/transhuge.txt and
Documentation/filesystems/proc.txt with some information on monitoring
transparent huge page usage and the associated overhead.

Signed-off-by: Mel Gorman
Signed-off-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-05-30 07:22:23 +0800

15 May, 2012

2 commits

165c8aed5 frontswap: s/put_page/store/g s/get_page/load ... Browse Code »

Sounds so much more natural.

Suggested-by: Andrea Arcangeli
Signed-off-by: Konrad Rzeszutek Wilk

Konrad Rzeszutek Wilk
2012-05-15 23:34:08 +0800
27c6aec21 mm: frontswap: config and doc files ... Browse Code »

This patch 4of4 adds configuration and documentation files including a FAQ.

[v14: updated docs/FAQ to use zcache and RAMster as examples]
[v10: no change]
[v9: akpm@linux-foundation.org: sysfs->debugfs; no longer need Doc/ABI file]
[v8: rebase to 3.0-rc4]
[v7: rebase to 3.0-rc3]
[v6: rebase to 3.0-rc1]
[v5: change config default to n]
[v4: rebase to 2.6.39]
Signed-off-by: Dan Magenheimer
Acked-by: Jan Beulich
Acked-by: Seth Jennings
Cc: Jeremy Fitzhardinge
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Nitin Gupta
Cc: Matthew Wilcox
Cc: Chris Mason
Cc: Rik Riel
Cc: Andrew Morton
Signed-off-by: Konrad Rzeszutek Wilk

Dan Magenheimer
2012-05-15 23:34:03 +0800