Eric Lee / smarc-fsl-linux-kernel

17 Nov, 2011

1 commit

cd12909cb xen: map foreign pages for shared rings by updating the PTEs directly ... Browse Code »

When mapping a foreign page with xenbus_map_ring_valloc() with the
GNTTABOP_map_grant_ref hypercall, set the GNTMAP_contains_pte flag and
pass a pointer to the PTE (in init_mm).

After the page is mapped, the usual fault mechanism can be used to
update additional MMs. This allows the vmalloc_sync_all() to be
removed from alloc_vm_area().

Signed-off-by: David Vrabel
Acked-by: Andrew Morton
[v1: Squashed fix by Michal for no-mmu case]
Signed-off-by: Konrad Rzeszutek Wilk
Signed-off-by: Michal Simek

David Vrabel
2011-11-17 01:13:08 +0800

31 Oct, 2011

1 commit

b95f1b31b mm: Map most files to use export.h instead of module.h ... Browse Code »

The files changed within are only using the EXPORT_SYMBOL
macro variants. They are not using core modular infrastructure
and hence don't need module.h but only the export.h header.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-10-31 21:20:12 +0800

26 Jul, 2011

1 commit

c15bef309 mmap: fix and tidy up overcommit page arithmetic ... Browse Code »

- shmem pages are not immediately available, but they are not
potentially available either, even if we swap them out, they will just
relocate from memory into swap, total amount of immediate and
potentially available memory is not going to be affected, so we
shouldn't count them as potentially free in the first place.

- nr_free_pages() is not an expensive operation anymore, there is no
need to split the decision making in two halves and repeat code.

Signed-off-by: Dmitry Fink
Reviewed-by: Minchan Kim
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dmitry Fink
2011-07-26 11:57:09 +0800

23 Jul, 2011

1 commit

8209f53d7 Merge branch 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc ... Browse Code »

* 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc: (39 commits)
ptrace: do_wait(traced_leader_killed_by_mt_exec) can block forever
ptrace: fix ptrace_signal() && STOP_DEQUEUED interaction
connector: add an event for monitoring process tracers
ptrace: dont send SIGSTOP on auto-attach if PT_SEIZED
ptrace: mv send-SIGSTOP from do_fork() to ptrace_init_task()
ptrace_init_task: initialize child->jobctl explicitly
has_stopped_jobs: s/task_is_stopped/SIGNAL_STOP_STOPPED/
ptrace: make former thread ID available via PTRACE_GETEVENTMSG after PTRACE_EVENT_EXEC stop
ptrace: wait_consider_task: s/same_thread_group/ptrace_reparented/
ptrace: kill real_parent_is_ptracer() in in favor of ptrace_reparented()
ptrace: ptrace_reparented() should check same_thread_group()
redefine thread_group_leader() as exit_signal >= 0
do not change dead_task->exit_signal
kill task_detached()
reparent_leader: check EXIT_DEAD instead of task_detached()
make do_notify_parent() __must_check, update the callers
__ptrace_detach: avoid task_detached(), check do_notify_parent()
kill tracehook_notify_death()
make do_notify_parent() return bool
ptrace: s/tracehook_tracer_task()/ptrace_parent()/
...

Linus Torvalds
2011-07-23 06:06:50 +0800

09 Jul, 2011

1 commit

8f3b1327a mm/nommu.c: fix remap_pfn_range() ... Browse Code »

remap_pfn_range() means map physical address pfn<vm_start = pfn << PAGE_SHIFT which
is wrong acroding the original meaning of this function. And some driver
developer using remap_pfn_range() with correct parameter will get
unexpected result because vm_start is changed. It should be implementd
like addr = pfn << PAGE_SHIFT but which is meanless on nommu arch, this
patch just make it simply return.

Parameter name and setting of vma->vm_flags also be fixed.

Signed-off-by: Bob Liu
Cc: Geert Uytterhoeven
Cc: David Howells
Acked-by: Greg Ungerer
Cc: Mike Frysinger
Cc: Bob Liu
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bob Liu
2011-07-09 12:14:44 +0800

23 Jun, 2011

1 commit

a288eecce ptrace: kill trivial tracehooks ... Browse Code »

At this point, tracehooks aren't useful to mainline kernel and mostly
just add an extra layer of obfuscation. Although they have comments,
without actual in-kernel users, it is difficult to tell what are their
assumptions and they're actually trying to achieve. To mainline
kernel, they just aren't worth keeping around.

This patch kills the following trivial tracehooks.

* Ones testing whether task is ptraced. Replace with ->ptrace test.

tracehook_expect_breakpoints()
tracehook_consider_ignored_signal()
tracehook_consider_fatal_signal()

* ptrace_event() wrappers. Call directly.

tracehook_report_exec()
tracehook_report_exit()
tracehook_report_vfork_done()

* ptrace_release_task() wrapper. Call directly.

tracehook_finish_release_task()

* noop

tracehook_prepare_release_task()
tracehook_report_death()

This doesn't introduce any behavior change.

Signed-off-by: Tejun Heo
Cc: Christoph Hellwig
Cc: Martin Schwidefsky
Signed-off-by: Oleg Nesterov

Tejun Heo
2011-06-23 01:26:28 +0800

25 May, 2011

8 commits

f67d9b157 nommu: add page alignment to mmap ... Browse Code »

Currently on nommu arch mmap(),mremap() and munmap() doesn't do
page_align() which isn't consist with mmu arch and cause some issues.

First, some drivers' mmap() function depends on vma->vm_end - vma->start
is page aligned which is true on mmu arch but not on nommu. eg: uvc
camera driver.

Second munmap() may return -EINVAL[split file] error in cases when end is
not page aligned(passed into from userspace) but vma->vm_end is aligned
dure to split or driver's mmap() ops.

Add page alignment to fix those issues.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Bob Liu
Cc: David Howells
Cc: Paul Mundt
Cc: Greg Ungerer
Cc: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bob Liu
2011-05-25 23:39:38 +0800
bb005a59e mm: nommu: fix a compile warning in do_mmap_pgoff() ... Browse Code »

Because 'ret' is declared as int, not unsigned long, no need to cast the
error contants into unsigned long. If you compile this code on a 64-bit
machine somehow, you'll see following warning:

CC mm/nommu.o
mm/nommu.c: In function `do_mmap_pgoff':
mm/nommu.c:1411: warning: overflow in implicit constant conversion

Signed-off-by: Namhyung Kim
Acked-by: Greg Ungerer
Cc: David Howells
Cc: Paul Mundt
Cc: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
2011-05-25 23:39:07 +0800
7223bb4a8 mm: nommu: fix a potential memory leak in do_mmap_private() ... Browse Code »

If f_op->read() fails and sysctl_nr_trim_pages > 1, there could be a
memory leak between @region->vm_end and @region->vm_top.

Signed-off-by: Namhyung Kim
Acked-by: Greg Ungerer
Cc: David Howells
Cc: Paul Mundt
Cc: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
2011-05-25 23:39:06 +0800
d75a310c4 mm: nommu: check the vma list when unmapping file-mapped vma ... Browse Code »

Now we have the sorted vma list, use it in do_munmap() to check that we
have an exact match.

Signed-off-by: Namhyung Kim
Acked-by: Greg Ungerer
Cc: David Howells
Cc: Paul Mundt
Cc: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
2011-05-25 23:39:06 +0800
e922c4c53 mm: nommu: find vma using the sorted vma list ... Browse Code »

Now we have the sorted vma list, use it in the find_vma[_exact]() rather
than doing linear search on the rb-tree.

Signed-off-by: Namhyung Kim
Acked-by: Greg Ungerer
Cc: David Howells
Cc: Paul Mundt
Cc: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
2011-05-25 23:39:06 +0800
b951bf2c4 mm: nommu: don't scan the vma list when deleting ... Browse Code »

Since commit 297c5eee3724 ("mm: make the vma list be doubly linked") made
it a doubly linked list, we don't need to scan the list when deleting
@vma.

And the original code didn't update the prev pointer. Fix it too.

Signed-off-by: Namhyung Kim
Acked-by: Greg Ungerer
Cc: David Howells
Cc: Paul Mundt
Cc: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
2011-05-25 23:39:05 +0800
6038def0d mm: nommu: sort mm->mmap list properly ... Browse Code »

When I was reading nommu code, I found that it handles the vma list/tree
in an unusual way. IIUC, because there can be more than one
identical/overrapped vmas in the list/tree, it sorts the tree more
strictly and does a linear search on the tree. But it doesn't applied to
the list (i.e. the list could be constructed in a different order than
the tree so that we can't use the list when finding the first vma in that
order).

Since inserting/sorting a vma in the tree and link is done at the same
time, we can easily construct both of them in the same order. And linear
searching on the tree could be more costly than doing it on the list, it
can be converted to use the list.

Also, after the commit 297c5eee3724 ("mm: make the vma list be doubly
linked") made the list be doubly linked, there were a couple of code need
to be fixed to construct the list properly.

Patch 1/6 is a preparation. It maintains the list sorted same as the tree
and construct doubly-linked list properly. Patch 2/6 is a simple
optimization for the vma deletion. Patch 3/6 and 4/6 convert tree
traversal to list traversal and the rest are simple fixes and cleanups.

This patch:

@vma added into @mm should be sorted by start addr, end addr and VMA
struct addr in that order because we may get identical VMAs in the @mm.
However this was true only for the rbtree, not for the list.

This patch fixes this by remembering 'rb_prev' during the tree traversal
like find_vma_prepare() does and linking the @vma via __vma_link_list().
After this patch, we can iterate the whole VMAs in correct order simply by
using @mm->mmap list.

[akpm@linux-foundation.org: avoid duplicating __vma_link_list()]
Signed-off-by: Namhyung Kim
Acked-by: Greg Ungerer
Cc: David Howells
Cc: Paul Mundt
Cc: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
2011-05-25 23:39:05 +0800
7bf02ea22 arch, mm: filter disallowed nodes from arch specific show_mem functions ... Browse Code »

Architectures that implement their own show_mem() function did not pass
the filter argument to show_free_areas() to appropriately avoid emitting
the state of nodes that are disallowed in the current context. This patch
now passes the filter argument to show_free_areas() so those nodes are now
avoided.

This patch also removes the show_free_areas() wrapper around
__show_free_areas() and converts existing callers to pass an empty filter.

ia64 emits additional information for each node, so skip_free_areas_zone()
must be made global to filter disallowed nodes and it is converted to use
a nid argument rather than a zone for this use case.

Signed-off-by: David Rientjes
Cc: Russell King
Cc: Tony Luck
Cc: Fenghua Yu
Cc: Kyle McMartin
Cc: Helge Deller
Cc: James Bottomley
Cc: "David S. Miller"
Cc: Guan Xuetao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-05-25 23:39:03 +0800

29 Mar, 2011

1 commit

f55f199b7 NOMMU: implement access_remote_vm ... Browse Code »

Recent vm changes brought in a new function which the core procfs code
utilizes. So implement it for nommu systems too to avoid link failures.

Signed-off-by: Mike Frysinger
Signed-off-by: David Howells
Tested-by: Simon Horman
Tested-by: Ithamar Adema
Acked-by: Greg Ungerer

Mike Frysinger
2011-03-29 21:05:12 +0800

25 Mar, 2011

1 commit

6c5103890 Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
Documentation/iostats.txt: bit-size reference etc.
cfq-iosched: removing unnecessary think time checking
cfq-iosched: Don't clear queue stats when preempt.
blk-throttle: Reset group slice when limits are changed
blk-cgroup: Only give unaccounted_time under debug
cfq-iosched: Don't set active queue in preempt
block: fix non-atomic access to genhd inflight structures
block: attempt to merge with existing requests on plug flush
block: NULL dereference on error path in __blkdev_get()
cfq-iosched: Don't update group weights when on service tree
fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
block: Require subsystems to explicitly allocate bio_set integrity mempool
jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
fs: make fsync_buffers_list() plug
mm: make generic_writepages() use plugging
blk-cgroup: Add unaccounted time to timeslice_used.
block: fixup plugging stubs for !CONFIG_BLOCK
block: remove obsolete comments for blkdev_issue_zeroout.
blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
...

Fix up conflicts in fs/{aio.c,super.c}

Linus Torvalds
2011-03-25 01:16:26 +0800

24 Mar, 2011

1 commit

cae5d3903 mm: arch: rename in_gate_area_no_task to in_gate_area_no_mm ... Browse Code »

Now that gate vma's are referenced with respect to a particular mm and not a
particular task it only makes sense to propagate the change to this predicate as
well.

Signed-off-by: Stephen Wilson
Reviewed-by: Michel Lespinasse
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Signed-off-by: Al Viro

Stephen Wilson
2011-03-24 04:36:55 +0800

10 Mar, 2011

1 commit

7eaceacca block: remove per-queue plugging ... Browse Code »
86

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:52:07 +0800

14 Jan, 2011

1 commit

53a7706d5 mlock: do not hold mmap_sem for extended periods of time ... Browse Code »

__get_user_pages gets a new 'nonblocking' parameter to signal that the
caller is prepared to re-acquire mmap_sem and retry the operation if
needed. This is used to split off long operations if they are going to
block on a disk transfer, or when we detect contention on the mmap_sem.

[akpm@linux-foundation.org: remove ref to rwsem_is_contended()]
Signed-off-by: Michel Lespinasse
Cc: Hugh Dickins
Cc: Rik van Riel
Cc: Peter Zijlstra
Cc: Nick Piggin
Cc: KOSAKI Motohiro
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2011-01-14 09:32:36 +0800

24 Dec, 2010

2 commits

29c185e5c nommu: Provide stubbed alloc/free_vm_area() implementation. ... Browse Code »

Now that these have been introduced in to the vmalloc API, sync up the
nommu side of things. At present we don't deal with VMAs as such, so for
the time being these will simply BUG() out. In the future it should be
possible to support this interface by layering on top of the vm_regions.

Signed-off-by: Paul Mundt

Paul Mundt
2010-12-24 11:08:30 +0800
9a14f653d nommu: Fix up vmalloc_node() symbol export regression. ... Browse Code »

Commit e1ca778 ("mm: add vzalloc() and vzalloc_node() helpers") ended up
accidentally deleting the vmalloc_node() symbol export, resulting in:

"vmalloc_node" [net/core/pktgen.ko] undefined!
"vmalloc_node" [net/netfilter/x_tables.ko] undefined!

regressions.

Signed-off-by: Paul Mundt

Paul Mundt
2010-12-24 10:50:34 +0800

25 Nov, 2010

1 commit

04c349615 nommu: yield CPU while disposing VM ... Browse Code »

Depending on processor speed, page size, and the amount of memory a
process is allowed to amass, cleanup of a large VM may freeze the system
for many seconds. This can result in a watchdog timeout.

Make sure other tasks receive some service when cleaning up large VMs.

Signed-off-by: Steven J. Magnani
Cc: Greg Ungerer
Reviewed-by: KOSAKI Motohiro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Steven J. Magnani
2010-11-25 05:50:38 +0800

30 Oct, 2010

1 commit

120a795da audit mmap ... Browse Code »

Normal syscall audit doesn't catch 5th argument of syscall. It also
doesn't catch the contents of userland structures pointed to be
syscall argument, so for both old and new mmap(2) ABI it doesn't
record the descriptor we are mapping. For old one it also misses
flags.

Signed-off-by: Al Viro

Al Viro
2010-10-30 20:45:43 +0800

27 Oct, 2010

1 commit

e1ca7788d mm: add vzalloc() and vzalloc_node() helpers ... Browse Code »

Add vzalloc() and vzalloc_node() to encapsulate the
vmalloc-then-memset-zero operation.

Use __GFP_ZERO to zero fill the allocated memory.

Signed-off-by: Dave Young
Cc: Christoph Lameter
Acked-by: Greg Ungerer
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Young
2010-10-27 07:52:10 +0800

21 Aug, 2010

1 commit

297c5eee3 mm: make the vma list be doubly linked ... Browse Code »

It's a really simple list, and several of the users want to go backwards
in it to find the previous vma. So rather than have to look up the
previous entry with 'find_vma_prev()' or something similar, just make it
doubly linked instead.

Tested-by: Ian Campbell
Signed-off-by: Linus Torvalds

Linus Torvalds
2010-08-21 23:49:21 +0800

14 Aug, 2010

1 commit

fe622e76f NOMMU: Remove an extraneous no_printk() ... Browse Code »

Remove an extraneous no_printk() in mm/nommu.c that got missed when the
function got generalised from several things that used it in commit
12fdff3fc248 ("Add a dummy printk function for the maintenance of unused
printks").

Without this, the following error is observed:

mm/nommu.c:41: error: conflicting types for 'no_printk'
include/linux/kernel.h:314: error: previous definition of 'no_printk' was here

Reported-by: Michal Simek
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2010-08-14 07:55:25 +0800

26 May, 2010

1 commit

3c7b20454 nommu: allow private mappings of read-only devices ... Browse Code »

Slightly rearrange the logic that determines capabilities and vm_flags.
Disable BDI_CAP_MAP_DIRECT in all cases if the device can't support the
protections. Allow private readonly mappings of readonly backing devices.

Signed-off-by: Bernd Schmidt
Signed-off-by: Mike Frysinger
Acked-by: David McCullough
Acked-by: Greg Ungerer
Acked-by: Paul Mundt
Acked-by: David Howells
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bernd Schmidt
2010-05-26 23:19:23 +0800

26 Mar, 2010

2 commits

e1ee65d85 NOMMU: Fix __get_user_pages() to pin last page on offset buffers ... Browse Code »

Fix __get_user_pages() to make it pin the last page on a buffer that doesn't
begin at the start of a page, but is a multiple of PAGE_SIZE in size.

The problem is that __get_user_pages() advances the pointer too much when it
iterates to the next page if the page it's currently looking at isn't used from
the first byte. This can cause the end of a short VMA to be reached
prematurely, resulting in the last page being lost.

Signed-off-by: Steven J. Magnani
Signed-off-by: David Howells
Signed-off-by: Linus Torvalds

David Howells
2010-03-26 05:13:27 +0800
7561e8ca0 NOMMU: Revert 'nommu: get_user_pages(): pin last page on non-page-aligned start' ... Browse Code »

Revert the following patch:

commit c08c6e1f54c85fc299cf9f88cf330d6dd28a9a1d
Author: Steven J. Magnani
Date: Fri Mar 5 13:42:24 2010 -0800

nommu: get_user_pages(): pin last page on non-page-aligned start

As it assumes that the mappings begin at the start of pages - something that
isn't necessarily true on NOMMU systems. On NOMMU systems, it is possible for
a mapping to only occupy part of the page, and not necessarily touch either end
of it; in fact it's also possible for multiple non-overlapping mappings to
coexist on one page (consider direct mappings of ROMFS files, for example).

Signed-off-by: David Howells
Acked-by: Steven J. Magnani
Signed-off-by: Linus Torvalds

David Howells
2010-03-26 05:13:26 +0800

25 Mar, 2010

1 commit

3fa30460e nommu: fix an incorrect comment in the do_mmap_shared_file() ... Browse Code »

Fix an incorrect comment in the do_mmap_shared_file(). If a mapping is
requested MAP_SHARED, then a private copy cannot be made and still provide
correct semantics.

Signed-off-by: David Howells
Reported-by: Dave Hudson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2010-03-25 07:31:20 +0800

13 Mar, 2010

1 commit

a4679373c Add generic sys_old_mmap() ... Browse Code »

Add a generic implementation of the old mmap() syscall, which expects its
argument in a memory block and switch all architectures over to use it.

Signed-off-by: Christoph Hellwig
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Hirokazu Takata
Cc: Thomas Gleixner
Cc: Ingo Molnar
Reviewed-by: H. Peter Anvin
Cc: Al Viro
Cc: Arnd Bergmann
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: "Luck, Tony"
Cc: James Morris
Cc: Andreas Schwab
Acked-by: Jesper Nilsson
Acked-by: Russell King
Acked-by: Greg Ungerer
Acked-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2010-03-13 07:52:32 +0800

07 Mar, 2010

2 commits

c08c6e1f5 nommu: get_user_pages(): pin last page on non-page-aligned start ... Browse Code »

The noMMU version of get_user_pages() fails to pin the last page when the
start address isn't page-aligned. The patch fixes this in a way that
makes find_extend_vma() congruent to its MMU cousin.

Signed-off-by: Steven J. Magnani
Acked-by: Paul Mundt
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Steven J. Magnani
2010-03-07 03:26:27 +0800
5beb49305 mm: change anon_vma linking to fix multi-process server scalability issue ... Browse Code »

The old anon_vma code can lead to scalability issues with heavily forking
workloads. Specifically, each anon_vma will be shared between the parent
process and all its child processes.

In a workload with 1000 child processes and a VMA with 1000 anonymous
pages per process that get COWed, this leads to a system with a million
anonymous pages in the same anon_vma, each of which is mapped in just one
of the 1000 processes. However, the current rmap code needs to walk them
all, leading to O(N) scanning complexity for each page.

This can result in systems where one CPU is walking the page tables of
1000 processes in page_referenced_one, while all other CPUs are stuck on
the anon_vma lock. This leads to catastrophic failure for a benchmark
like AIM7, where the total number of processes can reach in the tens of
thousands. Real workloads are still a factor 10 less process intensive
than AIM7, but they are catching up.

This patch changes the way anon_vmas and VMAs are linked, which allows us
to associate multiple anon_vmas with a VMA. At fork time, each child
process gets its own anon_vmas, in which its COWed pages will be
instantiated. The parents' anon_vma is also linked to the VMA, because
non-COWed pages could be present in any of the children.

This reduces rmap scanning complexity to O(1) for the pages of the 1000
child processes, with O(N) complexity for at most 1/N pages in the system.
This reduces the average scanning cost in heavily forking workloads from
O(N) to 2.

The only real complexity in this patch stems from the fact that linking a
VMA to anon_vmas now involves memory allocations. This means vma_adjust
can fail, if it needs to attach a VMA to anon_vma structures. This in
turn means error handling needs to be added to the calling functions.

A second source of complexity is that, because there can be multiple
anon_vmas, the anon_vma linking in vma_adjust can no longer be done under
"the" anon_vma lock. To prevent the rmap code from walking up an
incomplete VMA, this patch introduces the VM_LOCK_RMAP VMA flag. This bit
flag uses the same slot as the NOMMU VM_MAPPED_COPY, with an ifdef in mm.h
to make sure it is impossible to compile a kernel that needs both symbolic
values for the same bitflag.

Some test results:

Without the anon_vma changes, when AIM7 hits around 9.7k users (on a test
box with 16GB RAM and not quite enough IO), the system ends up running
>99% in system time, with every CPU on the same anon_vma lock in the
pageout code.

With these changes, AIM7 hits the cross-over point around 29.7k users.
This happens with ~99% IO wait time, there never seems to be any spike in
system time. The anon_vma lock contention appears to be resolved.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Rik van Riel
Cc: KOSAKI Motohiro
Cc: Larry Woodman
Cc: Lee Schermerhorn
Cc: Minchan Kim
Cc: Andrea Arcangeli
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2010-03-07 03:26:26 +0800

17 Jan, 2010

4 commits

7e6608724 nommu: fix shared mmap after truncate shrinkage problems ... Browse Code »

Fix a problem in NOMMU mmap with ramfs whereby a shared mmap can happen
over the end of a truncation. The problem is that
ramfs_nommu_check_mappings() checks that the reduced file size against the
VMA tree, but not the vm_region tree.

The following sequence of events can cause the problem:

fd = open("/tmp/x", O_RDWR|O_TRUNC|O_CREAT, 0600);
ftruncate(fd, 32 * 1024);
a = mmap(NULL, 32 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
b = mmap(NULL, 16 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
munmap(a, 32 * 1024);
ftruncate(fd, 16 * 1024);
c = mmap(NULL, 32 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

Mapping 'a' creates a vm_region covering 32KB of the file. Mapping 'b'
sees that the vm_region from 'a' is covering the region it wants and so
shares it, pinning it in memory.

Mapping 'a' then goes away and the file is truncated to the end of VMA
'b'. However, the region allocated by 'a' is still in effect, and has
_not_ been reduced.

Mapping 'c' is then created, and because there's a vm_region covering the
desired region, get_unmapped_area() is _not_ called to repeat the check,
and the mapping is granted, even though the pages from the latter half of
the mapping have been discarded.

However:

d = mmap(NULL, 16 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

Mapping 'd' should work, and should end up sharing the region allocated by
'a'.

To deal with this, we shrink the vm_region struct during the truncation,
lest do_mmap_pgoff() take it as licence to share the full region
automatically without calling the get_unmapped_area() file op again.

Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2010-01-17 04:15:40 +0800
efc1a3b16 nommu: don't need get_unmapped_area() for NOMMU ... Browse Code »

get_unmapped_area() is unnecessary for NOMMU as no-one calls it.

Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2010-01-17 04:15:40 +0800
779c10232 nommu: remove a superfluous check of vm_region::vm_usage ... Browse Code »

In split_vma(), there's no need to check if the VMA being split has a
region that's in use by more than one VMA because:

(1) The preceding test prohibits splitting of non-anonymous VMAs and regions
(eg: file or chardev backed VMAs).

(2) Anonymous regions can't be mapped multiple times because there's no handle
by which to refer to the already existing region.

(3) If a VMA has previously been split, then the region backing it has also
been split into two regions, each of usage 1.

Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2010-01-17 04:15:40 +0800
1e2ae599d nommu: struct vm_region's vm_usage count need not be atomic ... Browse Code »

The vm_usage count field in struct vm_region does not need to be atomic as
it's only even modified whilst nommu_region_sem is write locked.

Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2010-01-17 04:15:40 +0800

07 Jan, 2010

2 commits

7959722b9 NOMMU: Use copy_*_user_page() in access_process_vm() ... Browse Code »

The MMU code uses the copy_*_user_page() variants in access_process_vm()
rather than copy_*_user() as the former includes an icache flush. This
is important when doing things like setting software breakpoints with
gdb. So switch the NOMMU code over to do the same.

This patch makes the reasonable assumption that copy_from_user_page()
won't fail - which is probably fine, as we've checked the VMA from which
we're copying is usable, and the copy is not allowed to cross VMAs. The
one case where it might go wrong is if the VMA is a device rather than
RAM, and that device returns an error which - in which case rubbish will
be returned rather than EIO.

Signed-off-by: Jie Zhang
Signed-off-by: Mike Frysinger
Signed-off-by: David Howells
Acked-by: David McCullough
Acked-by: Paul Mundt
Acked-by: Greg Ungerer
Signed-off-by: Linus Torvalds

Jie Zhang
2010-01-07 10:16:02 +0800
cfe79c00a NOMMU: Avoiding duplicate icache flushes of shared maps ... Browse Code »

When working with FDPIC, there are many shared mappings of read-only
code regions between applications (the C library, applet packages like
busybox, etc.), but the current do_mmap_pgoff() function will issue an
icache flush whenever a VMA is added to an MM instead of only doing it
when the map is initially created.

The flush can instead be done when a region is first mmapped PROT_EXEC.
Note that we may not rely on the first mapping of a region being
executable - it's possible for it to be PROT_READ only, so we have to
remember whether we've flushed the region or not, and then flush the
entire region when a bit of it is made executable.

However, this also affects the brk area. That will no longer be
executable. We can mprotect() it to PROT_EXEC on MPU-mode kernels, but
for NOMMU mode kernels, when it increases the brk allocation, making
sys_brk() flush the extra from the icache should suffice. The brk area
probably isn't used by NOMMU programs since the brk area can only use up
the leavings from the stack allocation, where the stack allocation is
larger than requested.

Signed-off-by: David Howells
Signed-off-by: Mike Frysinger
Signed-off-by: Linus Torvalds

Mike Frysinger
2010-01-07 10:16:02 +0800

31 Dec, 2009

1 commit

66f0dc481 mm: move sys_mmap_pgoff from util.c ... Browse Code »

Move sys_mmap_pgoff() from mm/util.c to mm/mmap.c and mm/nommu.c,
where we'd expect to find such code: especially now that it contains
the MAP_HUGETLB handling. Revert mm/util.c to how it was in 2.6.32.

This patch just ignores MAP_HUGETLB in the nommu case, as in 2.6.32,
whereas 2.6.33-rc2 reported -ENOSYS. Perhaps validate_mmap_request()
should reject it with -EINVAL? Add that later if necessary.

Signed-off-by: Hugh Dickins
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-12-31 04:23:27 +0800