Eric Lee / smarc-fsl-linux-kernel

22 Jul, 2007

1 commit

b50731732 nommu: vmalloc_32_user()/vm_insert_page() and symbol exports. ... Browse Code »

Trying to survive an allmodconfig on a nommu platform results in many
screen lengths of module unhappiness. Many of the mmap related things that
binfmt_flat hooks in to are never exported despite being global, and there
are also missing definitions for vmalloc_32_user() and vm_insert_page().

I've implemented vmalloc_32_user() trying to stick as close to the
mm/vmalloc.c implementation as possible, though we don't have any need for
VM_USERMAP, so groveling for the VMA can be skipped. vm_insert_page() has
been stubbed for now in order to keep the build happy.

Signed-off-by: Paul Mundt
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Mundt
2007-07-22 08:49:14 +0800

20 Jul, 2007

2 commits

d0217ac04 mm: fault feedback #1 ... Browse Code »

Change ->fault prototype. We now return an int, which contains
VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
FAULT_RET_ code tells the VM whether a page was found, whether it has been
locked, and potentially other things. This is not quite the way he wanted
it yet, but that's changed in the next patch (which requires changes to
arch code).

This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
that a page is locked which requires filemap_nopage to go away (because we
can no longer remain backward compatible without that flag), but we were
going to do that anyway.

struct fault_data is renamed to struct vm_fault as Linus asked. address
is now a void __user * that we should firmly encourage drivers not to use
without really good reason.

The page is now returned via a page pointer in the vm_fault struct.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-07-20 01:04:41 +0800
54cb8821d mm: merge populate and nopage into fault (fixes nonlinear) ... Browse Code »

Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
the virtual address -> file offset differently from linear mappings.

->populate is a layering violation because the filesystem/pagecache code
should need to know anything about the virtual memory mapping. The hitch here
is that the ->nopage handler didn't pass down enough information (ie. pgoff).
But it is more logical to pass pgoff rather than have the ->nopage function
calculate it itself anyway (because that's a similar layering violation).

Having the populate handler install the pte itself is likewise a nasty thing
to be doing.

This patch introduces a new fault handler that replaces ->nopage and
->populate and (later) ->nopfn. Most of the old mechanism is still in place
so there is a lot of duplication and nice cleanups that can be removed if
everyone switches over.

The rationale for doing this in the first place is that nonlinear mappings are
subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
to duplicate the synchronisation logic rather than just consolidate the two.

After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
pagecache. Seems like a fringe functionality anyway.

NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no
users have hit mainline yet.

[akpm@linux-foundation.org: cleanup]
[randy.dunlap@oracle.com: doc. fixes for readahead]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Nick Piggin
Signed-off-by: Randy Dunlap
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-07-20 01:04:41 +0800

17 Jul, 2007

1 commit

57c8f63e8 nommu: stub expand_stack() for nommu case ... Browse Code »

Be consistent with VM mmap, implement expand_stack(). We can't actually do
anything other than return an error in the no MMU case though.

Signed-off-by: Greg Ungerer
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Ungerer
2007-07-17 00:05:37 +0800

12 Jul, 2007

1 commit

ed0321895 security: Protection for exploiting null dereference using mmap ... Browse Code »

Add a new security check on mmap operations to see if the user is attempting
to mmap to low area of the address space. The amount of space protected is
indicated by the new proc tunable /proc/sys/vm/mmap_min_addr and defaults to
0, preserving existing behavior.

This patch uses a new SELinux security class "memprotect." Policy already
contains a number of allow rules like a_t self:process * (unconfined_t being
one of them) which mean that putting this check in the process class (its
best current fit) would make it useless as all user processes, which we also
want to protect against, would be allowed. By taking the memprotect name of
the new class it will also make it possible for us to move some of the other
memory protect permissions out of 'process' and into the new class next time
we bump the policy version number (which I also think is a good future idea)

Acked-by: Stephen Smalley
Acked-by: Chris Wright
Signed-off-by: Eric Paris
Signed-off-by: James Morris

Eric Paris
2007-07-12 10:52:29 +0800

09 May, 2007

1 commit

1eeb66a1b move die notifier handling to common code ... Browse Code »

This patch moves the die notifier handling to common code. Previous
various architectures had exactly the same code for it. Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)

arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at. avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: Christoph Hellwig
Cc:
Cc: Russell King
Signed-off-by: Bryan Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2007-05-09 02:15:04 +0800

13 Apr, 2007

1 commit

6a04de6db [PATCH] nommu: fix bug ip_conntrack does not work on nommu ... Browse Code »

num_physpages is not exported out in mm/nommu.c, so the ip_conntrack module
link will fail.

Signed-off-by: Bryan Wu
Acked-By: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu, Bryan
2007-04-13 06:31:42 +0800

23 Mar, 2007

2 commits

165b23927 [PATCH] NOMMU: make SYSV SHM nattch work correctly ... Browse Code »

Make the SYSV SHM nattch counter work correctly by forcing multiple VMAs to
be produced to represent MAP_SHARED segments, even if they overlap exactly.

Using this test program:

http://people.redhat.com/~dhowells/doshm.c

Run as:

doshm sysv

I can see nattch going from one before the patch:

# /doshm sysv
Command: sysv
shmid: 65536
memory: 0xc3700000
c0b00000-c0b04000 rw-p 00000000 00:00 0
c0bb0000-c0bba788 r-xs 00000000 00:0b 14582157 /lib/ld-uClibc-0.9.28.so
c3180000-c31dede4 r-xs 00000000 00:0b 14582179 /lib/libuClibc-0.9.28.so
c3520000-c352278c rw-p 00000000 00:0b 13763417 /doshm
c3584000-c35865e8 r-xs 00000000 00:0b 13763417 /doshm
c3588000-c358aa00 rw-p 00008000 00:0b 14582157 /lib/ld-uClibc-0.9.28.so
c3590000-c359b6c0 rw-p 00000000 00:00 0
c3620000-c3640000 rwxp 00000000 00:00 0
c3700000-c37fa000 rw-S 00000000 00:06 1411 /SYSV00000000 (deleted)
c3700000-c37fa000 rw-S 00000000 00:06 1411 /SYSV00000000 (deleted)
nattch 1

To two after the patch:

# /doshm sysv
Command: sysv
shmid: 0
memory: 0xc3700000
c0bb0000-c0bba788 r-xs 00000000 00:0b 14582157 /lib/ld-uClibc-0.9.28.so
c3180000-c31dede4 r-xs 00000000 00:0b 14582179 /lib/libuClibc-0.9.28.so
c3320000-c3340000 rwxp 00000000 00:00 0
c3530000-c35325e8 r-xs 00000000 00:0b 13763417 /doshm
c3534000-c353678c rw-p 00000000 00:0b 13763417 /doshm
c3538000-c353aa00 rw-p 00008000 00:0b 14582157 /lib/ld-uClibc-0.9.28.so
c3590000-c359b6c0 rw-p 00000000 00:00 0
c35a4000-c35a8000 rw-p 00000000 00:00 0
c3700000-c37fa000 rw-S 00000000 00:06 1369 /SYSV00000000 (deleted)
c3700000-c37fa000 rw-S 00000000 00:06 1369 /SYSV00000000 (deleted)
nattch 2

That's +1 to nattch for each shmat() made.

Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2007-03-23 10:39:06 +0800
d56e03cd2 [PATCH] NOMMU: supply get_unmapped_area() to fix NOMMU SYSV SHM ... Browse Code »

Supply a get_unmapped_area() to fix NOMMU SYSV SHM support.

Signed-off-by: David Howells
Acked-by: Adam Litke
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2007-03-23 10:39:05 +0800

09 Dec, 2006

1 commit

e9536ae72 [PATCH] struct path: convert mm ... Browse Code »

Signed-off-by: Josef Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josef Sipek
2006-12-09 00:28:47 +0800

08 Dec, 2006

1 commit

4668edc33 [PATCH] kernel core: replace kmalloc+memset with kzalloc ... Browse Code »

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Burman Yan
2006-12-08 00:39:41 +0800

06 Dec, 2006

1 commit

f81cff0d4 [PATCH] uclinux: fix mmap() of directory for nommu case ... Browse Code »

I was playing with blackfin when i hit a neat bug ... doing an open() on a
directory and then passing that fd to mmap() would cause the kernel to hang

after poking into the code a bit more, i found that
mm/nommu.c:validate_mmap_request() checks the length and if it is 0, just
returns the address ... this is in stark contrast to mmu's
mm/mmap.c:do_mmap_pgoff() where it returns -EINVAL for 0 length requests ...
i then noticed that some other parts of the logic is out of date between the
two funcs, so perhaps that's the easy fix ?

Signed-off-by: Greg Ungerer
Signed-off-by: Linus Torvalds

Mike Frysinger
2006-12-06 23:41:26 +0800

04 Oct, 2006

1 commit

c1c8897f8 Spelling fix: "control" instead of "cotrol" ... Browse Code »

This patch against fixes a spelling mistake ("control" instead of "cotrol").

Signed-off-by: Michael Opdenacker
Acked-by: Alan Cox
Signed-off-by: Adrian Bunk

Michael Opdenacker
2006-10-04 05:21:02 +0800

01 Oct, 2006

1 commit

3fcd03e07 [PATCH] NOMMU: don't try and give NULL to fput() ... Browse Code »

Don't try and give NULL to fput() in the error handling in do_mmap_pgoff()
as it'll cause an oops.

Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gavin Lambert
2006-10-01 15:39:17 +0800

27 Sep, 2006

8 commits

930e652a2 [PATCH] NOMMU: Make futexes work under NOMMU conditions ... Browse Code »

Make futexes work under NOMMU conditions.

This can be tested by running this in one shell:

#define SYSERROR(X, Y) \
do { if ((long)(X) == -1L) { perror(Y); exit(1); }} while(0)

int main()
{
int shmid, tmp, *f, n;

shmid = shmget(23, 4, IPC_CREAT|0666);
SYSERROR(shmid, "shmget");

f = shmat(shmid, NULL, 0);
SYSERROR(f, "shmat");

n = *f;
printf("WAIT: %p{%x}\n", f, n);
tmp = futex(f, FUTEX_WAIT, n, NULL, NULL, 0);
SYSERROR(tmp, "futex");
printf("WAITED: %d\n", tmp);

tmp = shmdt(f);
SYSERROR(tmp, "shmdt");

exit(0);
}

And then this in the other shell:

#define SYSERROR(X, Y) \
do { if ((long)(X) == -1L) { perror(Y); exit(1); }} while(0)

int main()
{
int shmid, tmp, *f;

shmid = shmget(23, 4, IPC_CREAT|0666);
SYSERROR(shmid, "shmget");

f = shmat(shmid, NULL, 0);
SYSERROR(f, "shmat");

(*f)++;
printf("WAKE: %p{%x}\n", f, *f);
tmp = futex(f, FUTEX_WAKE, 1, NULL, NULL, 0);
SYSERROR(tmp, "futex");
printf("WOKE: %d\n", tmp);

tmp = shmdt(f);
SYSERROR(tmp, "shmdt");

exit(0);
}

The first program will set up a SYSV IPC SHM segment and wait on a futex in it
for the number at the start to change. The program will increment that number
and wake the first program up. This leads to output of the form:

SHELL 1 SHELL 2
======================= =======================
# /dowait
WAIT: 0xc32ac000{0}
# /dowake
WAKE: 0xc32ac000{1}
WAITED: 0 WOKE: 1

Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-09-27 23:26:15 +0800
6fa5f80bc [PATCH] NOMMU: Make mremap() partially work for NOMMU kernels ... Browse Code »

Make mremap() partially work for NOMMU kernels. It may resize a VMA provided
that it doesn't exceed the size of the slab object in which the storage is
allocated that the VMA refers to. Shareable VMAs may not be resized.

Moving VMAs (as permitted by MREMAP_MAYMOVE) is not currently supported.

This patch also makes use of the fact that the VMA list is now ordered to cut
it short when possible.

Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-09-27 23:26:14 +0800
3034097a5 [PATCH] NOMMU: Order the per-mm_struct VMA list ... Browse Code »

Order the per-mm_struct VMA list by address so that searching it can be cut
short when the appropriate address has been exceeded.

Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-09-27 23:26:14 +0800
d00c7b993 [PATCH] NOMMU: Permit ptrace to ignore non-PROT_WRITE VMAs in NOMMU mode ... Browse Code »

Permit ptrace to modify a section that's non-shared but is marked
unwritable, such as is obtained by mapping the text segment of an ELF-FDPIC
executable binary with into a binary that's being ptraced[*].

[*] Under NOMMU conditions ptrace causes read-only MAP_PRIVATE mmaps to become
totally private copies because if a private mapping was actually shared
then the debugging setting breakpoints in it would potentially crash
other processes.

This is done by using the VM_MAYWRITE flag rather than the VM_WRITE flag
when deciding whether to permit a write.

Without this patch a debugger can't set breakpoints in the mapped text
sections of executables that are mapped read-only private, even if the
mmap() syscall has taken a private copy because PT_PTRACED is set.

In addition, VM_MAYREAD is used instead of VM_READ for similar reasons.

Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-09-27 23:26:14 +0800
7b4d5b8b3 [PATCH] NOMMU: Check VMA protections ... Browse Code »

Check the VMA protections in get_user_pages() against what's being asked.

This checks to see that we don't accidentally write on a non-writable VMA or
permit an I/O mapping VMA to be accessed (which may lack page structs).

Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-09-27 23:26:14 +0800
910e46da4 [PATCH] Check if start address is in vma region in NOMMU function get_user_pages() ... Browse Code »

In NOMMU arch, if run "cat /proc/self/mem", data from physical address 0
are read. This behavior is different from MMU arch. In IA32, message
"cat: /proc/self/mem: Input/output error" is reported.

This issue is rootcaused by not validate the start address in NOMMU
function get_user_pages(). Following patch solves this issue.

Signed-off-by: Sonic Zhang
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sonic Zhang
2006-09-27 23:26:14 +0800
0159b141d [PATCH] NOMMU: Use find_vma() rather than reimplementing a VMA search ... Browse Code »

Use find_vma() in the NOMMU version of access_process_vm() rather than
reimplementing it.

Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-09-27 23:26:14 +0800
0ec76a110 [PATCH] NOMMU: Check that access_process_vm() has a valid target ... Browse Code »

Check that access_process_vm() is accessing a valid mapping in the target
process.

This limits ptrace() accesses and accesses through /proc//maps to only
those regions actually mapped by a program.

Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-09-27 23:26:14 +0800

26 Sep, 2006

1 commit

972d1a7b1 [PATCH] ZVC: Support NR_SLAB_RECLAIMABLE / NR_SLAB_UNRECLAIMABLE ... Browse Code »

Remove the atomic counter for slab_reclaim_pages and replace the counter
and NR_SLAB with two ZVC counter that account for unreclaimable and
reclaimable slab pages: NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE.

Change the check in vmscan.c to refer to to NR_SLAB_RECLAIMABLE. The
intend seems to be to check for slab pages that could be freed.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:51 +0800

15 Jul, 2006

1 commit

22c4af409 [PATCH] nommu: export two symbols for drivers to use ... Browse Code »

nommu.c needs to export two more symbols for drivers to use:
remap_pfn_range and unmap_mapping_range.

Signed-off-by: Luke Yang
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luke Yang
2006-07-15 12:53:53 +0800

01 Jul, 2006

1 commit

347ce434d [PATCH] zoned vm counters: conversion of nr_pagecache to per zone counter ... Browse Code »

Currently a single atomic variable is used to establish the size of the page
cache in the whole machine. The zoned VM counters have the same method of
implementation as the nr_pagecache code but also allow the determination of
the pagecache size per zone.

Remove the special implementation for nr_pagecache and make it a zoned counter
named NR_FILE_PAGES.

Updates of the page cache counters are always performed with interrupts off.
We can therefore use the __ variant here.

Signed-off-by: Christoph Lameter
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-07-01 02:25:34 +0800

11 Apr, 2006

1 commit

d5ddc79bc [PATCH] overcommit: use totalreserve_pages for nommu ... Browse Code »

This patch is an enhancement of OVERCOMMIT_GUESS algorithm in
__vm_enough_memory() in mm/nommu.c.

When the OVERCOMMIT_GUESS algorithm calculates the number of free pages,
the algorithm subtracts the number of reserved pages from the result
nr_free_pages().

Signed-off-by: Hideo Aoki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hideo AOKI
2006-04-11 21:18:32 +0800

22 Mar, 2006

1 commit

84097518d [PATCH] mm: nommu use compound pages ... Browse Code »

Now that compound page handling is properly fixed in the VM, move nommu
over to using compound pages rather than rolling their own refcounting.

nommu vm page refcounting is broken anyway, but there is no need to have
divergent code in the core VM now, nor when it gets fixed.

Signed-off-by: Nick Piggin
Cc: David Howells

(Needs testing, please).
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-03-22 23:54:01 +0800

01 Mar, 2006

1 commit

f61388822 [PATCH] nommu: implement vmalloc_node() ... Browse Code »

Fix oprofile linkage. Pointed out by "Luke Yang" .

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-03-01 12:53:44 +0800

21 Feb, 2006

1 commit

7a9166e3b [PATCH] Fix undefined symbols for nommu architecture ... Browse Code »

Signed-off-by: Luke Yang
Acked-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luke Yang
2006-02-21 12:00:11 +0800

07 Jan, 2006

1 commit

b0e15190e [PATCH] NOMMU: Make SYSV IPC SHM use ramfs facilities on NOMMU ... Browse Code »

The attached patch makes the SYSV IPC shared memory facilities use the new
ramfs facilities on a no-MMU kernel.

The following changes are made:

(1) There are now shmem_mmap() and shmem_get_unmapped_area() functions to
allow the IPC SHM facilities to commune with the tiny-shmem and shmem
code.

(2) ramfs files now need resizing using do_truncate() rather than by modifying
the inode size directly (see shmem_file_setup()). This causes ramfs to
attempt to bind a block of pages of sufficient size to the inode.

(3) CONFIG_SYSVIPC is no longer contingent on CONFIG_MMU.

Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-01-07 00:33:32 +0800

29 Nov, 2005

1 commit

6aab341e0 mm: re-architect the VM_UNPAGED logic ... Browse Code »

This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
explicit support for a "remapped page range" aka VM_PFNMAP. It allows a
VM area to contain an arbitrary range of page table entries that the VM
never touches, and never considers to be normal pages.

Any user of "remap_pfn_range()" automatically gets this new
functionality, and doesn't even have to mark the pages reserved or
indeed mark them any other way. It just works. As a side effect, doing
mmap() on /dev/mem works for arbitrary ranges.

Sparc update from David in the next commit.

Signed-off-by: Linus Torvalds

Linus Torvalds
2005-11-29 06:34:23 +0800

07 Nov, 2005

1 commit

55be570c5 [PATCH] mm/{mmap,nommu}.c: several unexports ... Browse Code »

I didn't find any possible modular usage in the kernel.

This patch was already ACK'ed by Christoph Hellwig.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2005-11-07 23:54:06 +0800

30 Oct, 2005

3 commits

deceb6cd1 [PATCH] mm: follow_page with inner ptlock ... Browse Code »

Final step in pushing down common core's page_table_lock. follow_page no
longer wants caller to hold page_table_lock, uses pte_offset_map_lock itself;
and so no page_table_lock is taken in get_user_pages itself.

But get_user_pages (and get_futex_key) do then need follow_page to pin the
page for them: take Daniel's suggestion of bitflags to follow_page.

Need one for WRITE, another for TOUCH (it was the accessed flag before:
vanished along with check_user_page_readable, but surely get_numa_maps is
wrong to mark every page it finds as accessed), another for GET.

And another, ANON to dispose of untouched_anonymous_page: it seems silly for
that to descend a second time, let follow_page observe if there was no page
table and return ZERO_PAGE if so. Fix minor bug in that: check VM_LOCKED -
make_pages_present ought to make readonly anonymous present.

Give get_numa_maps a cond_resched while we're there.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-30 12:40:41 +0800
365e9c87a [PATCH] mm: update_hiwaters just in time ... Browse Code »

update_mem_hiwater has attracted various criticisms, in particular from those
concerned with mm scalability. Originally it was called whenever rss or
total_vm got raised. Then many of those callsites were replaced by a timer
tick call from account_system_time. Now Frank van Maarseveen reports that to
be found inadequate. How about this? Works for Frank.

Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros
update_hiwater_rss and update_hiwater_vm. Don't attempt to keep
mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually
by 1): those are hot paths. Do the opposite, update only when about to lower
rss (usually by many), or just before final accounting in do_exit. Handle
mm->hiwater_vm in the same way, though it's much less of an issue. Demand
that whoever collects these hiwater statistics do the work of taking the
maximum with rss or total_vm.

And there has been no collector of these hiwater statistics in the tree. The
new convention needs an example, so match Frank's usage by adding a VmPeak
line above VmSize to /proc//status, and also a VmHWM line above VmRSS
(High-Water-Mark or High-Water-Memory).

There was a particular anomaly during mremap move, that hiwater_vm might be
captured too high. A fleeting such anomaly remains, but it's quickly
corrected now, whereas before it would stick.

What locking? None: if the app is racy then these statistics will be racy,
it's not worth any overhead to make them exact. But whenever it suits,
hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under
page_table_lock (for now) or with preemption disabled (later on): without
going to any trouble, minimize the time between reading current values and
updating, to minimize those occasions when a racing thread bumps a count up
and back down in between.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-30 12:40:39 +0800
4294621f4 [PATCH] mm: rss = file_rss + anon_rss ... Browse Code »

I was lazy when we added anon_rss, and chose to change as few places as
possible. So currently each anonymous page has to be counted twice, in rss
and in anon_rss. Which won't be so good if those are atomic counts in some
configurations.

Change that around: keep file_rss and anon_rss separately, and add them
together (with get_mm_rss macro) when the total is needed - reading two
atomics is much cheaper than updating two atomics. And update anon_rss
upfront, typically in memory.c, not tucked away in page_add_anon_rmap.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-30 12:40:38 +0800

09 Oct, 2005

1 commit

dd0fc66fb [PATCH] gfp flags annotations - part 1 ... Browse Code »

- added typedef unsigned int __nocast gfp_t;

- replaced __nocast uses for gfp flags with gfp_t - it gives exactly
the same warnings as far as sparse is concerned, doesn't change
generated code (from gcc point of view we replaced unsigned int with
typedef) and documents what's going on far better.

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2005-10-09 06:00:57 +0800

12 Sep, 2005

1 commit

66aa2b4b1 [PATCH] uclinux: add NULL check, 0 end valid check and some more exports to nommu.c ... Browse Code »

Move call to get_mm_counter() in update_mem_hiwater() to be
inside the check for tsk->mm being null. Otherwise you can be
following a null pointer here. This patch submitted by
Javier Herrero .

Modify the end check for munmap regions to allow for the
legacy behavior of 0 being valid. Pretty much all current
uClinux system libc malloc's pass in 0 as the end point.
A hard check will fail on these, so change the check so
that if it is non-zero it must be valid otherwise it fails.
A passed in value will always succeed (as it used too).

Also export a few more mm system functions - to be consistent
with the VM code exports.

Signed-off-by: Greg Ungerer
Signed-off-by: Linus Torvalds

Greg Ungerer
2005-09-12 11:43:47 +0800

05 Aug, 2005

1 commit

2f60f8d35 [PATCH] __vm_enough_memory() signedness fix ... Browse Code »

We have found what seems to be a small bug in __vm_enough_memory() when
sysctl_overcommit_memory is set to OVERCOMMIT_NEVER.

When this bug occurs the systems fails to boot, with /sbin/init whining
about fork() returning ENOMEM.

We hunted down the problem to this:

The deferred update mecanism used in vm_acct_memory(), on a SMP system,
allows the vm_committed_space counter to have a negative value.

This should not be a problem since this counter is known to be inaccurate.

But in __vm_enough_memory() this counter is compared to the `allowed'
variable, which is an unsigned long. This comparison is broken since it
will consider the negative values of vm_committed_space to be huge positive
values, resulting in a memory allocation failure.

Signed-off-by:
Signed-off-by:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Simon Derr
2005-08-05 12:43:14 +0800

22 Jun, 2005

1 commit

1363c3cd8 [PATCH] Avoiding mmap fragmentation ... Browse Code »

Ingo recently introduced a great speedup for allocating new mmaps using the
free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
causes huge performance increases in thread creation.

The downside of this patch is that it does lead to fragmentation in the
mmap-ed areas (visible via /proc/self/maps), such that some applications
that work fine under 2.4 kernels quickly run out of memory on any 2.6
kernel.

The problem is twofold:

1) the free_area_cache is used to continue a search for memory where
the last search ended. Before the change new areas were always
searched from the base address on.

So now new small areas are cluttering holes of all sizes
throughout the whole mmap-able region whereas before small holes
tended to close holes near the base leaving holes far from the base
large and available for larger requests.

2) the free_area_cache also is set to the location of the last
munmap-ed area so in scenarios where we allocate e.g. five regions of
1K each, then free regions 4 2 3 in this order the next request for 1K
will be placed in the position of the old region 3, whereas before we
appended it to the still active region 1, placing it at the location
of the old region 2. Before we had 1 free region of 2K, now we only
get two free regions of 1K -> fragmentation.

The patch addresses thes issues by introducing yet another cache descriptor
cached_hole_size that contains the largest known hole size below the
current free_area_cache. If a new request comes in the size is compared
against the cached_hole_size and if the request can be filled with a hole
below free_area_cache the search is started from the base instead.

The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
(earlier posted) leakme.c test program terminates after 50000+ iterations
with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
(as expected) with thread creation, Ingo's test_str02 with 20000 threads
requires 0.7s system time.

Taking out Ingo's patch (un-patch available per request) by basically
deleting all mentions of free_area_cache from the kernel and starting the
search for new memory always at the respective bases we observe: leakme
terminates successfully with 11 distinctive hardly fragmented areas in
/proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
time for Ingo's test_str02 with 20000 threads.

Now - drumroll ;-) the appended patch works fine with leakme: it ends with
only 7 distinct areas in /proc/self/maps and also thread creation seems
sufficiently fast with 0.71s for 20000 threads.

Signed-off-by: Wolfgang Wander
Credit-to: "Richard Purdie"
Signed-off-by: Ken Chen
Acked-by: Ingo Molnar (partly)
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wolfgang Wander
2005-06-22 09:46:16 +0800

17 May, 2005

1 commit

7a019225c [PATCH] mm/nommu.c: try to fix __vmalloc ... Browse Code »

Linus changed the second argument of __vmalloc from int to unsigned int
breaking the compilation for CONFIG_MMU=n configurations (since he only
changed vmalloc.c but not nommu.c).

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2005-05-17 22:59:17 +0800