Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

22 Jan, 2014

1 commit

49f0ce5f9 mm: add overcommit_kbytes sysctl variable ... Browse Code »

Some applications that run on HPC clusters are designed around the
availability of RAM and the overcommit ratio is fine tuned to get the
maximum usage of memory without swapping. With growing memory, the
1%-of-all-RAM grain provided by overcommit_ratio has become too coarse
for these workload (on a 2TB machine it represents no less than 20GB).

This patch adds the new overcommit_kbytes sysctl variable that allow a
much finer grain.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix nommu build]
Signed-off-by: Jerome Marchand
Cc: Dave Hansen
Cc: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jerome Marchand
2014-01-22 08:19:44 +0800

13 Nov, 2013

1 commit

00619bcc4 mm: factor commit limit calculation ... Browse Code »

The same calculation is currently done in three differents places.
Factor that code so future changes has to be made at only one place.

[akpm@linux-foundation.org: uninline vm_commit_limit()]
Signed-off-by: Jerome Marchand
Cc: Dave Hansen
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jerome Marchand
2013-11-13 11:09:11 +0800

04 Jul, 2013

1 commit

917d9290a mm: tune vm_committed_as percpu_counter batching size ... Browse Code »

Currently the per cpu counter's batch size for memory accounting is
configured as twice the number of cpus in the system. However, for
system with very large memory, it is more appropriate to make it
proportional to the memory size per cpu in the system.

For example, for a x86_64 system with 64 cpus and 128 GB of memory, the
batch size is only 2*64 pages (0.5 MB). So any memory accounting
changes of more than 0.5MB will overflow the per cpu counter into the
global counter. Instead, for the new scheme, the batch size is
configured to be 0.4% of the memory/cpu = 8MB (128 GB/64 /256), which is
more inline with the memory size.

I've done a repeated brk test of 800KB (from will-it-scale test suite)
with 80 concurrent processes on a 4 socket Westmere machine with a total
of 40 cores. Without the patch, about 80% of cpu is spent on spin-lock
contention within the vm_committed_as counter. With the patch, there's
a 73x speedup on the benchmark and the lock contention drops off almost
entirely.

[akpm@linux-foundation.org: fix section mismatch]
Signed-off-by: Tim Chen
Cc: Tejun Heo
Cc: Eric Dumazet
Cc: Dave Hansen
Cc: Andi Kleen
Cc: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tim Chen
2013-07-04 07:07:32 +0800

29 Mar, 2013

1 commit

09a9f1d27 Revert "mm: introduce VM_POPULATE flag to better deal with racy userspace programs" ... Browse Code »

This reverts commit 186930500985 ("mm: introduce VM_POPULATE flag to
better deal with racy userspace programs").

VM_POPULATE only has any effect when userspace plays racy games with
vmas by trying to unmap and remap memory regions that mmap or mlock are
operating on.

Also, the only effect of VM_POPULATE when userspace plays such games is
that it avoids populating new memory regions that get remapped into the
address range that was being operated on by the original mmap or mlock
calls.

Let's remove VM_POPULATE as there isn't any strong argument to mandate a
new vm_flag.

Signed-off-by: Michel Lespinasse
Signed-off-by: Hugh Dickins
Signed-off-by: Linus Torvalds

Michel Lespinasse
2013-03-29 08:45:51 +0800

24 Feb, 2013

1 commit

186930500 mm: introduce VM_POPULATE flag to better deal with racy userspace programs ... Browse Code »

The vm_populate() code populates user mappings without constantly
holding the mmap_sem. This makes it susceptible to racy userspace
programs: the user mappings may change while vm_populate() is running,
and in this case vm_populate() may end up populating the new mapping
instead of the old one.

In order to reduce the possibility of userspace getting surprised by
this behavior, this change introduces the VM_POPULATE vma flag which
gets set on vmas we want vm_populate() to work on. This way
vm_populate() may still end up populating the new mapping after such a
race, but only if the new mapping is also one that the user has
requested (using MAP_SHARED, MAP_LOCKED or mlock) to be populated.

Signed-off-by: Michel Lespinasse
Acked-by: Rik van Riel
Tested-by: Andy Lutomirski
Cc: Greg Ungerer
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2013-02-24 09:50:11 +0800

16 Nov, 2012

1 commit

997071bcb mm: export a function to get vm committed memory ... Browse Code »

It will be useful to be able to access global memory commitment from
device drivers. On the Hyper-V platform, the host has a policy engine to
balance the available physical memory amongst all competing virtual
machines hosted on a given node. This policy engine is driven by a number
of metrics including the memory commitment reported by the guests. The
balloon driver for Linux on Hyper-V will use this function to retrieve
guest memory commitment. This function is also used in Xen self
ballooning code.

[akpm@linux-foundation.org: coding-style tweak]
Signed-off-by: K. Y. Srinivasan
Acked-by: David Rientjes
Acked-by: Dan Magenheimer
Cc: Konrad Rzeszutek Wilk
Cc: Jeremy Fitzhardinge
Signed-off-by: Andrew Morton
Signed-off-by: Greg Kroah-Hartman

K. Y. Srinivasan
2012-11-16 07:41:22 +0800

13 Oct, 2012

1 commit

607ca46e9 UAPI: (Scripted) Disintegrate include/linux ... Browse Code »
39

Signed-off-by: David Howells
Acked-by: Arnd Bergmann
Acked-by: Thomas Gleixner
Acked-by: Michael Kerrisk
Acked-by: Paul E. McKenney
Acked-by: Dave Jones

David Howells
2012-10-13 17:46:48 +0800

09 Oct, 2012

1 commit

e9714acf8 mm: kill vma flag VM_EXECUTABLE and mm->num_exe_file_vmas ... Browse Code »

Currently the kernel sets mm->exe_file during sys_execve() and then tracks
number of vmas with VM_EXECUTABLE flag in mm->num_exe_file_vmas, as soon
as this counter drops to zero kernel resets mm->exe_file to NULL. Plus it
resets mm->exe_file at last mmput() when mm->mm_users drops to zero.

VMA with VM_EXECUTABLE flag appears after mapping file with flag
MAP_EXECUTABLE, such vmas can appears only at sys_execve() or after vma
splitting, because sys_mmap ignores this flag. Usually binfmt module sets
mm->exe_file and mmaps executable vmas with this file, they hold
mm->exe_file while task is running.

comment from v2.6.25-6245-g925d1c4 ("procfs task exe symlink"),
where all this stuff was introduced:

> The kernel implements readlink of /proc/pid/exe by getting the file from
> the first executable VMA. Then the path to the file is reconstructed and
> reported as the result.
>
> Because of the VMA walk the code is slightly different on nommu systems.
> This patch avoids separate /proc/pid/exe code on nommu systems. Instead of
> walking the VMAs to find the first executable file-backed VMA we store a
> reference to the exec'd file in the mm_struct.
>
> That reference would prevent the filesystem holding the executable file
> from being unmounted even after unmapping the VMAs. So we track the number
> of VM_EXECUTABLE VMAs and drop the new reference when the last one is
> unmapped. This avoids pinning the mounted filesystem.

exe_file's vma accounting is hooked into every file mmap/unmmap and vma
split/merge just to fix some hypothetical pinning fs from umounting by mm,
which already unmapped all its executable files, but still alive.

Seems like currently nobody depends on this behaviour. We can try to
remove this logic and keep mm->exe_file until final mmput().

mm->exe_file is still protected with mm->mmap_sem, because we want to
change it via new sys_prctl(PR_SET_MM_EXE_FILE). Also via this syscall
task can change its mm->exe_file and unpin mountpoint explicitly.

Signed-off-by: Konstantin Khlebnikov
Cc: Alexander Viro
Cc: Carsten Otte
Cc: Chris Metcalf
Cc: Cyrill Gorcunov
Cc: Eric Paris
Cc: H. Peter Anvin
Cc: Hugh Dickins
Cc: Ingo Molnar
Cc: James Morris
Cc: Jason Baron
Cc: Kentaro Takeda
Cc: Matt Helsley
Cc: Nick Piggin
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Robert Richter
Cc: Suresh Siddha
Cc: Tetsuo Handa
Cc: Venkatesh Pallipadi
Acked-by: Linus Torvalds
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2012-10-09 15:22:18 +0800

27 Jul, 2011

1 commit

60063497a atomic: use <linux/atomic.h> ... Browse Code »

This allows us to move duplicated code in
(atomic_inc_not_zero() for now) to

Signed-off-by: Arun Sharma
Reviewed-by: Eric Dumazet
Cc: Ingo Molnar
Cc: David Miller
Cc: Eric Dumazet
Acked-by: Mike Frysinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arun Sharma
2011-07-27 07:49:47 +0800

03 May, 2009

1 commit

00a62ce91 mm: fix Committed_AS underflow on large NR_CPUS environment ... Browse Code »

The Committed_AS field can underflow in certain situations:

> # while true; do cat /proc/meminfo | grep _AS; sleep 1; done | uniq -c
> 1 Committed_AS: 18446744073709323392 kB
> 11 Committed_AS: 18446744073709455488 kB
> 6 Committed_AS: 35136 kB
> 5 Committed_AS: 18446744073709454400 kB
> 7 Committed_AS: 35904 kB
> 3 Committed_AS: 18446744073709453248 kB
> 2 Committed_AS: 34752 kB
> 9 Committed_AS: 18446744073709453248 kB
> 8 Committed_AS: 34752 kB
> 3 Committed_AS: 18446744073709320960 kB
> 7 Committed_AS: 18446744073709454080 kB
> 3 Committed_AS: 18446744073709320960 kB
> 5 Committed_AS: 18446744073709454080 kB
> 6 Committed_AS: 18446744073709320960 kB

Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does
not check for underflow.

But NR_CPUS proportional isn't good calculation. In general,
possibility of lock contention is proportional to the number of online
cpus, not theorical maximum cpus (NR_CPUS).

The current kernel has generic percpu-counter stuff. using it is right
way. it makes code simplify and percpu_counter_read_positive() don't
make underflow issue.

Reported-by: Dave Hansen
Signed-off-by: KOSAKI Motohiro
Cc: Eric B Munson
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: [All kernel versions]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-05-03 06:36:10 +0800

09 Jul, 2008

1 commit

b845f313d mm: Allow architectures to define additional protection bits ... Browse Code »

This patch allows architectures to define functions to deal with
additional protections bits for mmap() and mprotect().

arch_calc_vm_prot_bits() maps additonal protection bits to vm_flags
arch_vm_get_page_prot() maps additional vm_flags to the vma's vm_page_prot
arch_validate_prot() checks for valid values of the protection bits

Note: vm_get_page_prot() is now pretty ugly, but the generated code
should be identical for architectures that don't define additional
protection bits.

Signed-off-by: Dave Kleikamp
Acked-by: Andrew Morton
Acked-by: Hugh Dickins
Signed-off-by: Benjamin Herrenschmidt

Dave Kleikamp
2008-07-09 14:30:45 +0800

25 May, 2008

1 commit

80119ef5c mm: fix atomic_t overflow in vm ... Browse Code »

The atomic_t type is 32bit but a 64bit system can have more than 2^32
pages of virtual address space available. Without this we overflow on
ludicrously large mappings

Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alan Cox
2008-05-25 00:56:09 +0800

26 Apr, 2006

1 commit

62c4f0a2d Don't include linux/config.h from anywhere else in include/ ... Browse Code »

Signed-off-by: David Woodhouse

David Woodhouse
2006-04-26 19:56:16 +0800

25 Apr, 2006

1 commit

9cdcb5663 Sanitise linux/mman.h for userspace consumption ... Browse Code »

It only really needs to define a few constants and include
when it's used by userspace. Move the rest within #ifdef __KERNEL__

Signed-off-by: David Woodhouse

David Woodhouse
2006-04-25 21:18:07 +0800

17 Apr, 2005

1 commit

1da177e4c Linux-2.6.12-rc2 ... Browse Code »
212

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

Linus Torvalds
2005-04-17 06:20:36 +0800