Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

26 Sep, 2006

2 commits

972d1a7b1 [PATCH] ZVC: Support NR_SLAB_RECLAIMABLE / NR_SLAB_UNRECLAIMABLE ... Browse Code »

Remove the atomic counter for slab_reclaim_pages and replace the counter
and NR_SLAB with two ZVC counter that account for unreclaimable and
reclaimable slab pages: NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE.

Change the check in vmscan.c to refer to to NR_SLAB_RECLAIMABLE. The
intend seems to be to check for slab pages that could be freed.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-09-26 23:48:51 +0800
d08b3851d [PATCH] mm: tracking shared dirty pages ... Browse Code »

Tracking of dirty pages in shared writeable mmap()s.

The idea is simple: write protect clean shared writeable pages, catch the
write-fault, make writeable and set dirty. On page write-back clean all the
PTE dirty bits and write protect them once again.

The implementation is a tad harder, mainly because the default
backing_dev_info capabilities were too loosely maintained. Hence it is not
enough to test the backing_dev_info for cap_account_dirty.

The current heuristic is as follows, a VMA is eligible when:
- its shared writeable
(vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
- it is not a 'special' mapping
(vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
- the backing_dev_info is cap_account_dirty
mapping_cap_account_dirty(vma->vm_file->f_mapping)
- f_op->mmap() didn't change the default page protection

Page from remap_pfn_range() are explicitly excluded because their COW
semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
because they don't have a backing store anyway.

mprotect() is taught about the new behaviour as well. However it overrides
the last condition.

Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
It can be called on any page, but is currently only implemented for mapped
pages, if the page is found the be of a VMA that accounts dirty pages it will
also wrprotect the PTE.

Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
under ->private_lock. This seems to be safe, since ->private_lock is used to
serialize access to the buffers, not the page itself. This is needed because
clear_page_dirty() will call into page_mkclean() and would thereby violate
locking order.

[dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
Signed-off-by: Peter Zijlstra
Cc: Hugh Dickins
Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2006-09-26 23:48:44 +0800

23 Sep, 2006

1 commit

6585b5724 Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart ... Browse Code »

* master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart:
[AGPGART] Rework AGPv3 modesetting fallback.
[AGPGART] Add suspend callback for i965
[AGPGART] Fix number of aperture sizes in 830 gart structs.
[AGPGART] Intel 965 Express support.
[AGPGART] agp.h: constify struct agp_bridge_data::version
[AGPGART] const'ify VIA AGP PCI table.
[AGPGART] CONFIG_PM=n slim: drivers/char/agp/intel-agp.c
[AGPGART] CONFIG_PM=n slim: drivers/char/agp/efficeon-agp.c
[AGPGART] Const'ify the agpgart driver version.
[AGPGART] remove private page protection map

Linus Torvalds
2006-09-23 08:50:50 +0800

08 Sep, 2006

1 commit

3a4597568 [PATCH] IA64,sparc: local DoS with corrupted ELFs ... Browse Code »

This prevents cross-region mappings on IA64 and SPARC which could lead
to system crash. They were correctly trapped for normal mmap() calls,
but not for the kernel internal calls generated by executable loading.

This code just moves the architecture-specific cross-region checks into
an arch-specific "arch_mmap_check()" macro, and defines that for the
architectures that needed it (ia64, sparc and sparc64).

Architectures that don't have any special requirements can just ignore
the new cross-region check, since the mmap() code will just notice on
its own when the macro isn't defined.

Signed-off-by: Pavel Emelianov
Signed-off-by: Kirill Korotaev
Acked-by: David Miller
Signed-off-by: Greg Kroah-Hartman
[ Cleaned up to not affect architectures that don't need it ]
Signed-off-by: Linus Torvalds

Kirill Korotaev
2006-09-08 23:40:46 +0800

27 Jul, 2006

1 commit

804af2cf6 [AGPGART] remove private page protection map ... Browse Code »

AGP keeps its own copy of the protection_map, upcoming DRM changes will
also require access to this map from modules.

Signed-off-by: Hugh Dickins
Signed-off-by: Dave Airlie
Signed-off-by: Dave Jones

Hugh Dickins
2006-07-27 07:58:39 +0800

01 Jul, 2006

1 commit

347ce434d [PATCH] zoned vm counters: conversion of nr_pagecache to per zone counter ... Browse Code »

Currently a single atomic variable is used to establish the size of the page
cache in the whole machine. The zoned VM counters have the same method of
implementation as the nr_pagecache code but also allow the determination of
the pagecache size per zone.

Remove the special implementation for nr_pagecache and make it a zoned counter
named NR_FILE_PAGES.

Updates of the page cache counters are always performed with interrupts off.
We can therefore use the __ variant here.

Signed-off-by: Christoph Lameter
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-07-01 02:25:34 +0800

23 Jun, 2006

1 commit

9637a5efd [PATCH] add page_mkwrite() vm_operations method ... Browse Code »

Add a new VMA operation to notify a filesystem or other driver about the
MMU generating a fault because userspace attempted to write to a page
mapped through a read-only PTE.

This facility permits the filesystem or driver to:

(*) Implement storage allocation/reservation on attempted write, and so to
deal with problems such as ENOSPC more gracefully (perhaps by generating
SIGBUS).

(*) Delay making the page writable until the contents have been written to a
backing cache. This is useful for NFS/AFS when using FS-Cache/CacheFS.
It permits the filesystem to have some guarantee about the state of the
cache.

(*) Account and limit number of dirty pages. This is one piece of the puzzle
needed to make shared writable mapping work safely in FUSE.

Needed by cachefs (Or is it cachefiles? Or fscache? ).

At least four other groups have stated an interest in it or a desire to use
the functionality it provides: FUSE, OCFS2, NTFS and JFFS2. Also, things like
EXT3 really ought to use it to deal with the case of shared-writable mmap
encountering ENOSPC before we permit the page to be dirtied.

From: Peter Zijlstra

get_user_pages(.write=1, .force=1) can generate COW hits on read-only
shared mappings, this patch traps those as mkpage_write candidates and fails
to handle them the old way.

Signed-off-by: David Howells
Cc: Miklos Szeredi
Cc: Joel Becker
Cc: Mark Fasheh
Cc: Anton Altaparmakov
Cc: David Woodhouse
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-06-23 22:42:51 +0800

11 Apr, 2006

2 commits

6d9f78396 [PATCH] overcommit: use totalreserve_pages ... Browse Code »

This patch is an enhancement of OVERCOMMIT_GUESS algorithm in
__vm_enough_memory() in mm/mmap.c.

When the OVERCOMMIT_GUESS algorithm calculates the number of free pages,
the algorithm subtracts the number of reserved pages from the result
nr_free_pages().

Signed-off-by: Hideo Aoki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hideo AOKI
2006-04-11 21:18:32 +0800
1e624196f [PATCH] mm: fix bug in brk() ... Browse Code »

The code checks for newbrk with oldbrk which are page aligned before making
a check for the memory limit set of data segment. If the memory limit is
not page aligned in that case it bypasses the test for the limit if the
memory allocation is still for the same page.

Signed-off-by: Ram Gupta
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ram Gupta
2006-04-11 21:18:32 +0800

01 Apr, 2006

1 commit

46a350ef9 BUG_ON() Conversion in mm/mmap.c ... Browse Code »

this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
2006-04-01 07:23:29 +0800

26 Mar, 2006

1 commit

c5e3b83e9 [PATCH] mm: use kmem_cache_zalloc ... Browse Code »

Convert mm/ to use the new kmem_cache_zalloc allocator.

Signed-off-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pekka Enberg
2006-03-26 00:22:49 +0800

22 Mar, 2006

1 commit

a6f563db0 [PATCH] remove VM_DONTCOPY bogosities ... Browse Code »

Now that it's madvisable, remove two pieces of VM_DONTCOPY bogosity:

1. There was and is no logical reason why VM_DONTCOPY should be in the
list of flags which forbid vma merging (and those drivers which set
it are also setting VM_IO, which itself forbids the merge).

2. It's hard to understand the purpose of the VM_HUGETLB, VM_DONTCOPY
block in vm_stat_account: but never mind, it's under CONFIG_HUGETLB,
which (unlike CONFIG_HUGETLB_PAGE or CONFIG_HUGETLBFS) has never been
defined.

Signed-off-by: Hugh Dickins
Cc: William Lee Irwin III
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2006-03-22 23:54:01 +0800

12 Jan, 2006

1 commit

c59ede7b7 [PATCH] move capable() to capability.h ... Browse Code »

- Move capable() from sched.h to capability.h;

- Use where capable() is used
(in include/, block/, ipc/, kernel/, a few drivers/,
mm/, security/, & sound/;
many more drivers/ to go)

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy.Dunlap
2006-01-12 10:42:13 +0800

17 Dec, 2005

1 commit

4d7672b46 Make sure we copy pages inserted with "vm_insert_page()" on fork ... Browse Code »

The logic that decides that a fork() might be able to avoid copying a VM
area when it can be re-created by page faults didn't know about the new
vm_insert_page() case.

Also make some things a bit more anal wrt VM_PFNMAP.

Pointed out by Hugh Dickins

Signed-off-by: Linus Torvalds

Linus Torvalds
2005-12-17 02:21:23 +0800

23 Nov, 2005

1 commit

83e9b7e92 [PATCH] unpaged: private write VM_RESERVED ... Browse Code »

The PageReserved removal in 2.6.15-rc1 issued a "deprecated" message when you
tried to mmap or mprotect MAP_PRIVATE PROT_WRITE a VM_RESERVED, and failed
with -EACCES: because do_wp_page lacks the refinement to COW pages in those
areas, nor do we expect to find anonymous pages in them; and it seemed just
bloat to add code for handling such a peculiar case. But immediately it
caused vbetool and ddcprobe (using lrmi) to fail.

So revert the "deprecated" messages, letting mmap and mprotect succeed. But
leave do_wp_page's BUG_ON(vma->vm_flags & VM_RESERVED) in place until we've
added the code to do it right: so this particular patch is only good if the
app doesn't really need to write to that private area.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-11-23 01:13:42 +0800

19 Nov, 2005

1 commit

9ab885154 [PARISC] Fix compile warning caused by conflicting types of expand_upwards() ... Browse Code »

Fix compile warning caused by conflicting types of expand_upwards. IA64
requires it to not be static inline, as it's used outside mm/mmap.c

Signed-off-by: Matthew Wilcox
Signed-off-by: Kyle McMartin

Matthew Wilcox
2005-11-19 05:16:42 +0800

07 Nov, 2005

1 commit

55be570c5 [PATCH] mm/{mmap,nommu}.c: several unexports ... Browse Code »

I didn't find any possible modular usage in the kernel.

This patch was already ACK'ed by Christoph Hellwig.

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2005-11-07 23:54:06 +0800

31 Oct, 2005

1 commit

a241ec65a [PATCH] RCU torture-testing kernel module ... Browse Code »

This patch is a rewrite of the one submitted on October 1st, using modules
(http://marc.theaimsgroup.com/?l=linux-kernel&m=112819093522998&w=2).

This rewrite adds a tristate CONFIG_RCU_TORTURE_TEST, which enables an
intense torture test of the RCU infratructure. This is needed due to the
continued changes to the RCU infrastructure to accommodate dynamic ticks,
CPU hotplug, realtime, and so on. Most of the code is in a separate file
that is compiled only if the CONFIG variable is set. Documentation on how
to run the test and interpret the output is also included.

This code has been tested on i386 and ppc64, and an earlier version of the
code has received extensive testing on a number of architectures as part of
the PREEMPT_RT patchset.

Signed-off-by: "Paul E. McKenney"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul E. McKenney
2005-10-31 09:37:27 +0800

30 Oct, 2005

9 commits

508034a32 [PATCH] mm: unmap_vmas with inner ptlock ... Browse Code »

Remove the page_table_lock from around the calls to unmap_vmas, and replace
the pte_offset_map in zap_pte_range by pte_offset_map_lock: all callers are
now safe to descend without page_table_lock.

Don't attempt fancy locking for hugepages, just take page_table_lock in
unmap_hugepage_range. Which makes zap_hugepage_range, and the hugetlb test in
zap_page_range, redundant: unmap_vmas calls unmap_hugepage_range anyway. Nor
does unmap_vmas have much use for its mm arg now.

The tlb_start_vma and tlb_end_vma in unmap_page_range are now called without
page_table_lock: if they're implemented at all, they typically come down to
flush_cache_range (usually done outside page_table_lock) and flush_tlb_range
(which we already audited for the mprotect case).

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-30 12:40:41 +0800
8f4f8c164 [PATCH] mm: unlink vma before pagetables ... Browse Code »

In most places the descent from pgd to pud to pmd to pte holds mmap_sem
(exclusively or not), which ensures that free_pgtables cannot be freeing page
tables from any level at the same time. But truncation and reverse mapping
descend without mmap_sem.

No problem: just make sure that a vma is unlinked from its prio_tree (or
nonlinear list) and from its anon_vma list, after zapping the vma, but before
freeing its page tables. Then neither vmtruncate nor rmap can reach that vma
whose page tables are now volatile (nor do they need to reach it, since all
its page entries have been zapped by this stage).

The i_mmap_lock and anon_vma->lock already serialize this correctly; but the
locking hierarchy is such that we cannot take them while holding
page_table_lock. Well, we're trying to push that down anyway. So in this
patch, move anon_vma_unlink and unlink_file_vma into free_pgtables, at the
same time as moving page_table_lock around calls to unmap_vmas.

tlb_gather_mmu and tlb_finish_mmu then fall outside the page_table_lock, but
we made them preempt_disable and preempt_enable earlier; and a long source
audit of all the architectures has shown no problem with removing
page_table_lock from them. free_pgtables doesn't need page_table_lock for
itself, nor for what it calls; tlb->mm->nr_ptes is usually protected by
page_table_lock, but partly by non-exclusive mmap_sem - here it's decremented
with exclusive mmap_sem, or mm_users 0. update_hiwater_rss and
vm_unacct_memory don't need page_table_lock either.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-30 12:40:40 +0800
46dea3d09 [PATCH] mm: ia64 use expand_upwards ... Browse Code »

ia64 has expand_backing_store function for growing its Register Backing Store
vma upwards. But more complete code for this purpose is found in the
CONFIG_STACK_GROWSUP part of mm/mmap.c. Uglify its #ifdefs further to provide
expand_upwards for ia64 as well as expand_stack for parisc.

The Register Backing Store vma should be marked VM_ACCOUNT. Implement the
intention of growing it only a page at a time, instead of passing an address
outside of the vma to handle_mm_fault, with unknown consequences.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-30 12:40:39 +0800
365e9c87a [PATCH] mm: update_hiwaters just in time ... Browse Code »

update_mem_hiwater has attracted various criticisms, in particular from those
concerned with mm scalability. Originally it was called whenever rss or
total_vm got raised. Then many of those callsites were replaced by a timer
tick call from account_system_time. Now Frank van Maarseveen reports that to
be found inadequate. How about this? Works for Frank.

Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros
update_hiwater_rss and update_hiwater_vm. Don't attempt to keep
mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually
by 1): those are hot paths. Do the opposite, update only when about to lower
rss (usually by many), or just before final accounting in do_exit. Handle
mm->hiwater_vm in the same way, though it's much less of an issue. Demand
that whoever collects these hiwater statistics do the work of taking the
maximum with rss or total_vm.

And there has been no collector of these hiwater statistics in the tree. The
new convention needs an example, so match Frank's usage by adding a VmPeak
line above VmSize to /proc//status, and also a VmHWM line above VmRSS
(High-Water-Mark or High-Water-Memory).

There was a particular anomaly during mremap move, that hiwater_vm might be
captured too high. A fleeting such anomaly remains, but it's quickly
corrected now, whereas before it would stick.

What locking? None: if the app is racy then these statistics will be racy,
it's not worth any overhead to make them exact. But whenever it suits,
hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under
page_table_lock (for now) or with preemption disabled (later on): without
going to any trouble, minimize the time between reading current values and
updating, to minimize those occasions when a racing thread bumps a count up
and back down in between.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-30 12:40:39 +0800
b5810039a [PATCH] core remove PageReserved ... Browse Code »

Remove PageReserved() calls from core code by tightening VM_RESERVED
handling in mm/ to cover PageReserved functionality.

PageReserved special casing is removed from get_page and put_page.

All setting and clearing of PageReserved is retained, and it is now flagged
in the page_alloc checks to help ensure we don't introduce any refcount
based freeing of Reserved pages.

MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being
deprecated. We never completely handled it correctly anyway, and is be
reintroduced in future if required (Hugh has a proof of concept).

Once PageReserved() calls are removed from kernel/power/swsusp.c, and all
arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can
be trivially removed.

Last real user of PageReserved is swsusp, which uses PageReserved to
determine whether a struct page points to valid memory or not. This still
needs to be addressed (a generic page_is_ram() should work).

A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and
thus mapcounted and count towards shared rss). These writes to the struct
page could cause excessive cacheline bouncing on big systems. There are a
number of ways this could be addressed if it is an issue.

Signed-off-by: Nick Piggin

Refcount bug fix for filemap_xip.c

Signed-off-by: Carsten Otte
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-10-30 12:40:39 +0800
7c1fd6b96 [PATCH] mm: exit_mmap need not reset ... Browse Code »

exit_mmap resets various mm_struct fields, but the mm is well on its way out,
and none of those fields matter by this point.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-30 12:40:37 +0800
a8fb5618d [PATCH] mm: unlink_file_vma, remove_vma ... Browse Code »

Divide remove_vm_struct into two parts: first anon_vma_unlink plus
unlink_file_vma, to unlink the vma from the list and tree by which rmap or
vmtruncate might find it; then remove_vma to close, fput and free.

The intention here is to do the anon_vma_unlink and unlink_file_vma earlier,
in free_pgtables before freeing any page tables: so we can be sure that any
page tables traversed by rmap and vmtruncate are stable (and other, ordinary
cases are stabilized by holding mmap_sem).

This will be crucial to traversing pgd,pud,pmd without page_table_lock. But
testing the split-out patch showed that lifting the page_table_lock is
symbiotically necessary to make this change - the lock ordering is wrong to
move those unlinks into free_pgtables while it's under ptlock.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-30 12:40:37 +0800
2c0b38146 [PATCH] mm: remove_vma_list consolidation ... Browse Code »

unmap_vma doesn't amount to much, let's put it inside unmap_vma_list. Except
it doesn't unmap anything, unmap_region just did the unmapping: rename it to
remove_vma_list.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-30 12:40:37 +0800
ab50b8ed8 [PATCH] mm: vm_stat_account unshackled ... Browse Code »

The original vm_stat_account has fallen into disuse, with only one user, and
only one user of vm_stat_unaccount. It's easier to keep track if we convert
them all to __vm_stat_account, then free it from its __shackles.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-30 12:40:37 +0800

22 Sep, 2005

1 commit

f10df6860 [PATCH] fix locking comment in unmap_region() ... Browse Code »

That comment is plain wrong (we even take the pagetable lock inside
unmap_region()).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paolo 'Blaisorblade' Giarrusso
2005-09-22 01:11:55 +0800

15 Sep, 2005

1 commit

2fd4ef85e [PATCH] error path in setup_arg_pages() misses vm_unacct_memory() ... Browse Code »

Pavel Emelianov and Kirill Korotaev observe that fs and arch users of
security_vm_enough_memory tend to forget to vm_unacct_memory when a
failure occurs further down (typically in setup_arg_pages variants).

These are all users of insert_vm_struct, and that reservation will only
be unaccounted on exit if the vma is marked VM_ACCOUNT: which in some
cases it is (hidden inside VM_STACK_FLAGS) and in some cases it isn't.

So x86_64 32-bit and ppc64 vDSO ELFs have been leaking memory into
Committed_AS each time they're run. But don't add VM_ACCOUNT to them,
it's inappropriate to reserve against the very unlikely case that gdb
be used to COW a vDSO page - we ought to do something about that in
do_wp_page, but there are yet other inconsistencies to be resolved.

The safe and economical way to fix this is to let insert_vm_struct do
the security_vm_enough_memory check when it finds VM_ACCOUNT is set.

And the MIPS irix_brk has been calling security_vm_enough_memory before
calling do_brk which repeats it, doubly accounting and so also leaking.
Remove that, and all the fs and arch calls to security_vm_enough_memory:
give it a less misleading name later on.

Signed-off-by: Hugh Dickins
Signed-Off-By: Kirill Korotaev
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-09-15 02:18:13 +0800

08 Sep, 2005

2 commits

cdb3826b9 [PATCH] remove misleading comment above sys_brk ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2005-09-08 07:57:23 +0800
c3d8c1414 [PATCH] More __read_mostly variables ... Browse Code »

Move some more frequently read variables that showed up during some of our
performance tests as sometimes ending up in hot cachelines to the
read_mostly section.

Fix: Move the __read_mostly from before hpet_usec_quotient to follow the
variable like the other uses of __read_mostly.

Signed-off-by: Alok N Kataria
Signed-off-by: Christoph Lameter
Signed-off-by: Shai Fultheim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2005-09-08 07:57:18 +0800

05 Aug, 2005

1 commit

2f60f8d35 [PATCH] __vm_enough_memory() signedness fix ... Browse Code »

We have found what seems to be a small bug in __vm_enough_memory() when
sysctl_overcommit_memory is set to OVERCOMMIT_NEVER.

When this bug occurs the systems fails to boot, with /sbin/init whining
about fork() returning ENOMEM.

We hunted down the problem to this:

The deferred update mecanism used in vm_acct_memory(), on a SMP system,
allows the vm_committed_space counter to have a negative value.

This should not be a problem since this counter is known to be inaccurate.

But in __vm_enough_memory() this counter is compared to the `allowed'
variable, which is an unsigned long. This comparison is broken since it
will consider the negative values of vm_committed_space to be huge positive
values, resulting in a memory allocation failure.

Signed-off-by:
Signed-off-by:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Simon Derr
2005-08-05 12:43:14 +0800

22 Jun, 2005

2 commits

73219d178 [PATCH] mmap topdown fix for large stack limit, large allocation ... Browse Code »

The topdown changes in 2.6.12-rc1 can cause large allocations with large
stack limit to fail, despite there being space available. The
mmap_base-len is only valid when len >= mmap_base. However, nothing in
topdown allocator checks this. It's only (now) caught at higher level,
which will cause allocation to simply fail. The following change restores
the fallback to bottom-up path, which will allow large allocations with
large stack limit to potentially still succeed.

Signed-off-by: Chris Wright
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chris Wright
2005-06-22 09:46:16 +0800
1363c3cd8 [PATCH] Avoiding mmap fragmentation ... Browse Code »

Ingo recently introduced a great speedup for allocating new mmaps using the
free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
causes huge performance increases in thread creation.

The downside of this patch is that it does lead to fragmentation in the
mmap-ed areas (visible via /proc/self/maps), such that some applications
that work fine under 2.4 kernels quickly run out of memory on any 2.6
kernel.

The problem is twofold:

1) the free_area_cache is used to continue a search for memory where
the last search ended. Before the change new areas were always
searched from the base address on.

So now new small areas are cluttering holes of all sizes
throughout the whole mmap-able region whereas before small holes
tended to close holes near the base leaving holes far from the base
large and available for larger requests.

2) the free_area_cache also is set to the location of the last
munmap-ed area so in scenarios where we allocate e.g. five regions of
1K each, then free regions 4 2 3 in this order the next request for 1K
will be placed in the position of the old region 3, whereas before we
appended it to the still active region 1, placing it at the location
of the old region 2. Before we had 1 free region of 2K, now we only
get two free regions of 1K -> fragmentation.

The patch addresses thes issues by introducing yet another cache descriptor
cached_hole_size that contains the largest known hole size below the
current free_area_cache. If a new request comes in the size is compared
against the cached_hole_size and if the request can be filled with a hole
below free_area_cache the search is started from the base instead.

The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
(earlier posted) leakme.c test program terminates after 50000+ iterations
with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
(as expected) with thread creation, Ingo's test_str02 with 20000 threads
requires 0.7s system time.

Taking out Ingo's patch (un-patch available per request) by basically
deleting all mentions of free_area_cache from the kernel and starting the
search for new memory always at the respective bases we observe: leakme
terminates successfully with 11 distinctive hardly fragmented areas in
/proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
time for Ingo's test_str02 with 20000 threads.

Now - drumroll ;-) the appended patch works fine with leakme: it ends with
only 7 distinct areas in /proc/self/maps and also thread creation seems
sufficiently fast with 0.71s for 20000 threads.

Signed-off-by: Wolfgang Wander
Credit-to: "Richard Purdie"
Signed-off-by: Ken Chen
Acked-by: Ingo Molnar (partly)
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wolfgang Wander
2005-06-22 09:46:16 +0800

20 May, 2005

1 commit

07ab67c8d Fix get_unmapped_area sanity tests ... Browse Code »

As noted by Chris Wright, we need to do the full range of tests regardless
of whether MAP_FIXED is set or not, so re-organize get_unmapped_area()
slightly to do the sanity checks unconditionally.

Linus Torvalds
2005-05-20 13:43:37 +0800

19 May, 2005

1 commit

49a43876b [PATCH] prevent NULL mmap in topdown model ... Browse Code »

Prevent the topdown allocator from allocating mmap areas all the way
down to address zero.

We still allow a MAP_FIXED mapping of page 0 (needed for various things,
ranging from Wine and DOSEMU to people who want to allow speculative
loads off a NULL pointer).

Tested by Chris Wright.

Signed-off-by: Linus Torvalds

Linus Torvalds
2005-05-19 22:46:36 +0800

01 May, 2005

2 commits

93ea1d0a1 [PATCH] RLIMIT_MEMLOCK checking fix ... Browse Code »

Always use page counts when doing RLIMIT_MEMLOCK checking to avoid possible
overflow.

Signed-off-by: Chris Wright
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chris Wright
2005-05-01 23:58:38 +0800
119f657c7 [PATCH] RLIMIT_AS checking fix ... Browse Code »
26

Address bug #4508: there's potential for wraparound in the various places
where we perform RLIMIT_AS checking.

(I'm a bit worried about acct_stack_growth(). Are we sure that vma->vm_mm is
always equal to current->mm? If not, then we're comparing some other
process's total_vm with the calling process's rlimits).

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

akpm@osdl.org
2005-05-01 23:58:35 +0800

20 Apr, 2005

2 commits

561bbe323 [PATCH] freepgt: remove FIRST_USER_ADDRESS hack ... Browse Code »

Once all the MMU architectures define FIRST_USER_ADDRESS, remove hack from
mmap.c which derived it from FIRST_USER_PGD_NR.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-04-20 04:29:23 +0800
e2cdef8c8 [PATCH] freepgt: free_pgtables from FIRST_USER_ADDRESS ... Browse Code »

The patches to free_pgtables by vma left problems on any architectures which
leave some user address page table entries unencapsulated by vma. Andi has
fixed the 32-bit vDSO on x86_64 to use a vma. Now fix arm (and arm26), whose
first PAGE_SIZE is reserved (perhaps) for machine vectors.

Our calls to free_pgtables must not touch that area, and exit_mmap's
BUG_ON(nr_ptes) must allow that arm's get_pgd_slow may (or may not) have
allocated an extra page table, which its free_pgd_slow would free later.

FIRST_USER_PGD_NR has misled me and others: until all the arches define
FIRST_USER_ADDRESS instead, a hack in mmap.c to derive one from t'other. This
patch fixes the bugs, the remaining patches just clean it up.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-04-20 04:29:19 +0800