Eric Lee / smarc-fsl-linux-kernel

25 May, 2005

1 commit

cafdd8ba0 [PATCH] try_to_unmap_cluster() passes out-of-bounds pte to pte_unmap() ... Browse Code »

try_to_unmap_cluster() does:
for (pte = pte_offset_map(pmd, address);
address < end; pte++, address += PAGE_SIZE) {
...
}

pte_unmap(pte);

It may take a little staring to notice, but pte can actually fall off the
end of the pte page in this iteration, which makes life difficult for
kmap_atomic() and the users not expecting it to BUG(). Of course, we're
somewhat lucky in that arithmetic elsewhere in the function guarantees that
at least one iteration is made, lest this force larger rearrangements to be
made. This issue and patch also apply to non-mm mainline and with trivial
adjustments, at least two related kernels.

Discovered during internal testing at Oracle.

Signed-off-by: William Irwin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

William Lee Irwin III
2005-05-25 11:08:13 +0800

22 May, 2005

1 commit

b5c44c214 [PATCH] fix for __generic_file_aio_read() to return 0 on EOF ... Browse Code »

I came across the following problem while running ltp-aiodio testcases from
ltp-full-20050405 on linux-2.6.12-rc3-mm3. I tried running the tests with
EXT3 as well as JFS filesystems.

One or two fsx-linux testcases were hung after some time. These testcases
were hanging at wait_for_all_aios().

Debugging shows that there were some iocbs which were not getting completed
eventhough the last retry for those returned -EIOCBQUEUED. Also all such
pending iocbs represented READ operation.

Further debugging revealed that all such iocbs hit EOF in the DIO layer.
To be more precise, the "pos" from which they were trying to read was
greater than the "size" of the file. So the generic_file_direct_IO
returned 0.

This happens rarely as there is already a check in
__generic_file_aio_read(), for whether "pos" < "size" before calling direct
IO routine.

>size = i_size_read(inode);
>if (pos < size) {
> retval = generic_file_direct_IO(READ, iocb,
> iov, pos, nr_segs);

But for READ, we are taking the inode->i_sem only in the DIO layer. So it
is possible that some other process can change the size of the file before
we take the i_sem. In such a case ( when "pos" > "size"), the
__generic_file_aio_read() would return -EIOCBQUEUED even though there were
no I/O requests submitted by the DIO layer. This would cause the AIO layer
to expect aio_complete() for THE iocb, which doesnot happen. And thus the
test hangs forever, waiting for an I/O completion, where there are no
requests submitted at all.

The following patch makes __generic_file_aio_read() return 0 (instead of
returning -EIOCBQUEUED), on getting 0 from generic_file_direct_IO(), so
that the AIO layer does the aio_complete().

Testing:

I have tested the patch on a SMP machine(with 2 Pentium 4 (HT)) running
linux-2.6.12-rc3-mm3. I ran the ltp-aiodio testcases and none of the
fsx-linux tests hung. Also the aio-stress tests ran without any problem.

Signed-off-by: Suzuki K P
Signed-off-by: Suparna Bhattacharya
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Suparna Bhattacharya
2005-05-22 07:45:24 +0800

21 May, 2005

1 commit

7856dfeb2 [PATCH] x86_64: Fixed guard page handling again in iounmap ... Browse Code »

Caused oopses again. Also fix potential mismatch in checking if
change_page_attr was needed.

To do it without races I needed to change mm/vmalloc.c to export a
__remove_vm_area that does not take vmlist lock.

Noticed by Terence Ripperda and based on a patch of his.

Signed-off-by: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2005-05-21 06:48:20 +0800

20 May, 2005

1 commit

07ab67c8d Fix get_unmapped_area sanity tests ... Browse Code »

As noted by Chris Wright, we need to do the full range of tests regardless
of whether MAP_FIXED is set or not, so re-organize get_unmapped_area()
slightly to do the sanity checks unconditionally.

Linus Torvalds
2005-05-20 13:43:37 +0800

19 May, 2005

1 commit

49a43876b [PATCH] prevent NULL mmap in topdown model ... Browse Code »

Prevent the topdown allocator from allocating mmap areas all the way
down to address zero.

We still allow a MAP_FIXED mapping of page 0 (needed for various things,
ranging from Wine and DOSEMU to people who want to allow speculative
loads off a NULL pointer).

Tested by Chris Wright.

Signed-off-by: Linus Torvalds

Linus Torvalds
2005-05-19 22:46:36 +0800

17 May, 2005

5 commits

b81074800 [PATCH] do_swap_page() can map random data if swap read fails ... Browse Code »

There is a bug in do_swap_page(): when swap page happens to be unreadable,
page filled with random data is mapped into user address space. The fix is
to check for PageUptodate and send SIGBUS in case of error.

Signed-Off-By: Kirill Korotaev
Signed-Off-By: Alexey Kuznetsov
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill Korotaev
2005-05-17 22:59:20 +0800
ba32311eb [PATCH] swapout oops fix ... Browse Code »

Fix OOPS when swapping on a device that doesn't have an unplug_io_fn defined
(eg, ATA Over Ethernet)

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

McMullan, Jason
2005-05-17 22:59:18 +0800
7a019225c [PATCH] mm/nommu.c: try to fix __vmalloc ... Browse Code »

Linus changed the second argument of __vmalloc from int to unsigned int
breaking the compilation for CONFIG_MMU=n configurations (since he only
changed vmalloc.c but not nommu.c).

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2005-05-17 22:59:17 +0800
717990629 [PATCH] mm acct accounting fix ... Browse Code »

This patch fixes mm->total_vm and mm->locked_vm acctounting in case when
move_page_tables() fails inside move_vma().

Signed-Off-By: Kirill Korotaev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill Korotaev
2005-05-17 22:59:12 +0800
202d182a9 [PATCH] mm: fix rss counter being incremented when unmapping ... Browse Code »

This patch fixes a bug introduced by the "mm counter operations through
macros" patch, which replaced a decrement operation in with an increment
macro in try_to_unmap_one().

Signed-off-by: Björn Steinbrink
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bjorn Steinbrink
2005-05-17 22:59:12 +0800

06 May, 2005

1 commit

91bb52416 [PATCH] remove outdated comments from filemap.c ... Browse Code »

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2005-05-06 07:36:43 +0800

04 May, 2005

1 commit

7223a93a5 [IA64] Export node_online_map and node_possible_map ... Browse Code »

Export node_online_map and node_possible_map so that kernel modules can use
the nodemask macros, like, for_each_node() and for_each_online_node().

Signed-off-by: Dean Nelson
Signed-off-by: Tony Luck

Dean Nelson
2005-05-04 03:09:32 +0800

01 May, 2005

16 commits

67be2dd1b [PATCH] DocBook: fix some descriptions ... Browse Code »

Some KernelDoc descriptions are updated to match the current code.
No code changes.

Signed-off-by: Martin Waitz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Martin Waitz
2005-05-01 23:59:26 +0800
4dc3b16ba [PATCH] DocBook: changes and extensions to the kernel documentation ... Browse Code »

I have recompiled Linux kernel 2.6.11.5 documentation for me and our
university students again. The documentation could be extended for more
sources which are equipped by structured comments for recent 2.6 kernels. I
have tried to proceed with that task. I have done that more times from 2.6.0
time and it gets boring to do same changes again and again. Linux kernel
compiles after changes for i386 and ARM targets. I have added references to
some more files into kernel-api book, I have added some section names as well.
So please, check that changes do not break something and that categories are
not too much skewed.

I have changed kernel-doc to accept "fastcall" and "asmlinkage" words reserved
by kernel convention. Most of the other changes are modifications in the
comments to make kernel-doc happy, accept some parameters description and do
not bail out on errors. Changed to @pid in the description, moved some
#ifdef before comments to correct function to comments bindings, etc.

You can see result of the modified documentation build at
http://cmp.felk.cvut.cz/~pisa/linux/lkdb-2.6.11.tar.gz

Some more sources are ready to be included into kernel-doc generated
documentation. Sources has been added into kernel-api for now. Some more
section names added and probably some more chaos introduced as result of quick
cleanup work.

Signed-off-by: Pavel Pisa
Signed-off-by: Martin Waitz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Pisa
2005-05-01 23:59:25 +0800
fbd568a3e [PATCH] Change synchronize_kernel to _rcu and _sched ... Browse Code »

This patch changes calls to synchronize_kernel(), deprecated in the earlier
"Deprecate synchronize_kernel, GPL replacement" patch to instead call the new
synchronize_rcu() and synchronize_sched() APIs.

Signed-off-by: Paul E. McKenney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul E. McKenney
2005-05-01 23:59:04 +0800
cd7619d6b [PATCH] Exterminate PAGE_BUG ... Browse Code »

Remove PAGE_BUG - repalce it with BUG and BUG_ON.

Signed-off-by: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matt Mackall
2005-05-01 23:59:01 +0800
d59dd4620 [PATCH] use smp_mb/wmb/rmb where possible ... Browse Code »

Replace a number of memory barriers with smp_ variants. This means we won't
take the unnecessary hit on UP machines.

Signed-off-by: Anton Blanchard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

akpm@osdl.org
2005-05-01 23:58:47 +0800
97e2bde47 [PATCH] add kmalloc_node, inline cleanup ... Browse Code »

The patch makes the following function calls available to allocate memory
on a specific node without changing the basic operation of the slab
allocator:

kmem_cache_alloc_node(kmem_cache_t *cachep, unsigned int flags, int node);
kmalloc_node(size_t size, unsigned int flags, int node);

in a similar way to the existing node-blind functions:

kmem_cache_alloc(kmem_cache_t *cachep, unsigned int flags);
kmalloc(size, flags);

kmem_cache_alloc_node was changed to pass flags and the node information
through the existing layers of the slab allocator (which lead to some minor
rearrangements). The functions at the lowest layer (kmem_getpages,
cache_grow) are already node aware. Also __alloc_percpu can call
kmalloc_node now.

Performance measurements (using the pageset localization patch) yields:

w/o patches:
Tasks jobs/min jti jobs/min/task real cpu
1 484.27 100 484.2736 12.02 1.97 Wed Mar 30 20:50:43 2005
100 25170.83 91 251.7083 23.12 150.10 Wed Mar 30 20:51:06 2005
200 34601.66 84 173.0083 33.64 294.14 Wed Mar 30 20:51:40 2005
300 37154.47 86 123.8482 46.99 436.56 Wed Mar 30 20:52:28 2005
400 39839.82 80 99.5995 58.43 580.46 Wed Mar 30 20:53:27 2005
500 40036.32 79 80.0726 72.68 728.60 Wed Mar 30 20:54:40 2005
600 44074.21 79 73.4570 79.23 872.10 Wed Mar 30 20:55:59 2005
700 44016.60 78 62.8809 92.56 1015.84 Wed Mar 30 20:57:32 2005
800 40411.05 80 50.5138 115.22 1161.13 Wed Mar 30 20:59:28 2005
900 42298.56 79 46.9984 123.83 1303.42 Wed Mar 30 21:01:33 2005
1000 40955.05 80 40.9551 142.11 1441.92 Wed Mar 30 21:03:55 2005

with pageset localization and slab API patches:
Tasks jobs/min jti jobs/min/task real cpu
1 484.19 100 484.1930 12.02 1.98 Wed Mar 30 21:10:18 2005
100 27428.25 92 274.2825 21.22 149.79 Wed Mar 30 21:10:40 2005
200 37228.94 86 186.1447 31.27 293.49 Wed Mar 30 21:11:12 2005
300 41725.42 85 139.0847 41.84 434.10 Wed Mar 30 21:11:54 2005
400 43032.22 82 107.5805 54.10 582.06 Wed Mar 30 21:12:48 2005
500 42211.23 83 84.4225 68.94 722.61 Wed Mar 30 21:13:58 2005
600 40084.49 82 66.8075 87.12 873.11 Wed Mar 30 21:15:25 2005
700 44169.30 79 63.0990 92.24 1008.77 Wed Mar 30 21:16:58 2005
800 43097.94 79 53.8724 108.03 1155.88 Wed Mar 30 21:18:47 2005
900 41846.75 79 46.4964 125.17 1303.38 Wed Mar 30 21:20:52 2005
1000 40247.85 79 40.2478 144.60 1442.21 Wed Mar 30 21:23:17 2005

Signed-off-by: Christoph Lameter
Signed-off-by: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2005-05-01 23:58:38 +0800
dd1d5afca [PATCH] sync_page() smp_mb() comment ... Browse Code »

The smp_mb() is becaus sync_page() doesn't have PG_locked while it accesses
page_mapping(page). The comments in the patch (the entire patch is the
addition of this comment) try to explain further how and why smp_mb() is
used.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

William Lee Irwin III
2005-05-01 23:58:38 +0800
93ea1d0a1 [PATCH] RLIMIT_MEMLOCK checking fix ... Browse Code »

Always use page counts when doing RLIMIT_MEMLOCK checking to avoid possible
overflow.

Signed-off-by: Chris Wright
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chris Wright
2005-05-01 23:58:38 +0800
edfbe2b00 [PATCH] count bounce buffer pages in vmstat ... Browse Code »

This is a patch for counting the number of pages for bounce buffers. It's
shown in /proc/vmstat.

Currently, the number of bounce pages are not counted anywhere. So, if
there are many bounce pages, it seems that there are leaked pages. And
it's difficult for a user to imagine the usage of bounce pages. So, it's
meaningful to show # of bouce pages.

Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2005-05-01 23:58:37 +0800
bd53b714d [PATCH] mm: use __GFP_NOMEMALLOC ... Browse Code »

Use the new __GFP_NOMEMALLOC to simplify the previous handling of
PF_MEMALLOC.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-05-01 23:58:37 +0800
20a77776c [PATCH] mempool: simplify alloc ... Browse Code »

Mempool is pretty clever. Looks too clever for its own good :) It
shouldn't really know so much about page reclaim internals.

- don't guess about what effective page reclaim might involve.

- don't randomly flush out all dirty data if some unlikely thing
happens (alloc returns NULL). page reclaim can (sort of :P) handle
it.

I think the main motivation is trying to avoid pool->lock at all costs.
However the first allocation is attempted with __GFP_WAIT cleared, so it
will be 'can_try_harder' if it hits the page allocator. So if allocation
still fails, then we can probably afford to hit the pool->lock - and what's
the alternative? Try page reclaim and hit zone->lru_lock?

A nice upshot is that we don't need to do any fancy memory barriers or do
(intentionally) racy access to pool-> fields outside the lock.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-05-01 23:58:37 +0800
b84a35be0 [PATCH] mempool: NOMEMALLOC and NORETRY ... Browse Code »

Mempools have 2 problems.

The first is that mempool_alloc can possibly get stuck in __alloc_pages
when they should opt to fail, and take an element from their reserved pool.

The second is that it will happily eat emergency PF_MEMALLOC reserves
instead of going to their reserved pools.

Fix the first by passing __GFP_NORETRY in the allocation calls in
mempool_alloc. Fix the second by introducing a __GFP_MEMPOOL flag which
directs the page allocator not to allocate from the reserve pool.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-05-01 23:58:36 +0800
8e30f272a [PATCH] mm: pcp use non powers of 2 for batch size ... Browse Code »

Jack Steiner reported this to have fixed his problem (bad colouring):
"The patches fix both problems that I found - bad
coloring & excessive pages in pagesets."

In most workloads this is not likely to be such a pronounced problem,
however it should help corner cases. And avoiding powers of 2 in these
types of memory operations is always a good idea.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2005-05-01 23:58:36 +0800
81b4082dc [PATCH] mm: rmap.c cleanup ... Browse Code »

mm/rmap.c:page_referenced_one() and mm/rmap.c:try_to_unmap_one() contain
identical code that

- takes mm->page_table_lock;

- drills through page tables;

- checks that correct pte is reached.

Coalesce this into page_check_address()

Signed-off-by: Nikita Danilov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nikita Danilov
2005-05-01 23:58:36 +0800
119f657c7 [PATCH] RLIMIT_AS checking fix ... Browse Code »

Address bug #4508: there's potential for wraparound in the various places
where we perform RLIMIT_AS checking.

(I'm a bit worried about acct_stack_growth(). Are we sure that vma->vm_mm is
always equal to current->mm? If not, then we're comparing some other
process's total_vm with the calling process's rlimits).

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

akpm@osdl.org
2005-05-01 23:58:35 +0800
f021e9210 [PATCH] generic_file_buffered_write fixes ... Browse Code »

Anton Altaparmakov points out:

- It calls fault_in_pages_readable() which is completely bogus if @nr_segs >
1. It needs to be replaced by a to be written
"fault_in_pages_readable_iovec()".

- It increments @buf even in the iovec case thus @buf can point to random
memory really quickly (in the iovec case) and then it calls
fault_in_pages_readable() on this random memory.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

akpm@osdl.org
2005-05-01 23:58:35 +0800

25 Apr, 2005

1 commit

01424961e [PATCH] mempolicy.c GFP fix ... Browse Code »

zonelist_policy() forgot to mask non-zone bits from gfp when comparing
zone number with policy_zone.

ACKed-by: Andi Kleen
Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2005-04-25 03:28:34 +0800

20 Apr, 2005

7 commits

561bbe323 [PATCH] freepgt: remove FIRST_USER_ADDRESS hack ... Browse Code »

Once all the MMU architectures define FIRST_USER_ADDRESS, remove hack from
mmap.c which derived it from FIRST_USER_PGD_NR.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-04-20 04:29:23 +0800
8462e2017 [PATCH] freepgt: sys_mincore ignore FIRST_USER_PGD_NR ... Browse Code »

Remove use of FIRST_USER_PGD_NR from sys_mincore: it's inconsistent (no other
syscall refers to it), unnecessary (sys_mincore loops over vmas further down)
and incorrect (misses user addresses in ARM's first pgd).

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-04-20 04:29:20 +0800
e2cdef8c8 [PATCH] freepgt: free_pgtables from FIRST_USER_ADDRESS ... Browse Code »

The patches to free_pgtables by vma left problems on any architectures which
leave some user address page table entries unencapsulated by vma. Andi has
fixed the 32-bit vDSO on x86_64 to use a vma. Now fix arm (and arm26), whose
first PAGE_SIZE is reserved (perhaps) for machine vectors.

Our calls to free_pgtables must not touch that area, and exit_mmap's
BUG_ON(nr_ptes) must allow that arm's get_pgd_slow may (or may not) have
allocated an extra page table, which its free_pgd_slow would free later.

FIRST_USER_PGD_NR has misled me and others: until all the arches define
FIRST_USER_ADDRESS instead, a hack in mmap.c to derive one from t'other. This
patch fixes the bugs, the remaining patches just clean it up.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-04-20 04:29:19 +0800
146425a31 [PATCH] freepgt: mpnt to vma cleanup ... Browse Code »

While dabbling here in mmap.c, clean up mysterious "mpnt"s to "vma"s.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-04-20 04:29:18 +0800
3bf5ee956 [PATCH] freepgt: hugetlb_free_pgd_range ... Browse Code »

ia64 and ppc64 had hugetlb_free_pgtables functions which were no longer being
called, and it wasn't obvious what to do about them.

The ppc64 case turns out to be easy: the associated tables are noted elsewhere
and freed later, safe to either skip its hugetlb areas or go through the
motions of freeing nothing. Since ia64 does need a special case, restore to
ppc64 the special case of skipping them.

The ia64 hugetlb case has been broken since pgd_addr_end went in, though it
probably appeared to work okay if you just had one such area; in fact it's
been broken much longer if you consider a long munmap spanning from another
region into the hugetlb region.

In the ia64 hugetlb region, more virtual address bits are available than in
the other regions, yet the page tables are structured the same way: the page
at the bottom is larger. Here we need to scale down each addr before passing
it to the standard free_pgd_range. Was about to write a hugely_scaled_down
macro, but found htlbpage_to_page already exists for just this purpose. Fixed
off-by-one in ia64 is_hugepage_only_range.

Uninline free_pgd_range to make it available to ia64. Make sure the
vma-gathering loop in free_pgtables cannot join a hugepage_only_range to any
other (safe to join huges? probably but don't bother).

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-04-20 04:29:16 +0800
ee39b37b2 [PATCH] freepgt: remove MM_VM_SIZE(mm) ... Browse Code »

There's only one usage of MM_VM_SIZE(mm) left, and it's a troublesome macro
because mm doesn't contain the (32-bit emulation?) info needed. But it too is
only needed because we ignore the end from the vma list.

We could make flush_pgtables return that end, or unmap_vmas. Choose the
latter, since it's a natural fit with unmap_mapping_range_vma needing to know
its restart addr. This does make more than minimal change, but if unmap_vmas
had returned the end before, this is how we'd have done it, rather than
storing the break_addr in zap_details.

unmap_vmas used to return count of vmas scanned, but that's just debug which
hasn't been useful in a while; and if we want the map_count 0 on exit check
back, it can easily come from the final remove_vm_struct loop.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-04-20 04:29:15 +0800
e0da382c9 [PATCH] freepgt: free_pgtables use vma list ... Browse Code »

Recent woes with some arches needing their own pgd_addr_end macro; and 4-level
clear_page_range regression since 2.6.10's clear_page_tables; and its
long-standing well-known inefficiency in searching throughout the higher-level
page tables for those few entries to clear and free: all can be blamed on
ignoring the list of vmas when we free page tables.

Replace exit_mmap's clear_page_range of the total user address space by
free_pgtables operating on the mm's vma list; unmap_region use it in the same
way, giving floor and ceiling beyond which it may not free tables. This
brings lmbench fork/exec/sh numbers back to 2.6.10 (unless preempt is enabled,
in which case latency fixes spoil unmap_vmas throughput).

Beware: the do_mmap_pgoff driver failure case must now use unmap_region
instead of zap_page_range, since a page table might have been allocated, and
can only be freed while it is touched by some vma.

Move free_pgtables from mmap.c to memory.c, where its lower levels are adapted
from the clear_page_range levels. (Most of free_pgtables' old code was
actually for a non-existent case, prev not properly set up, dating from before
hch gave us split_vma.) Pass mmu_gather** in the public interfaces, since we
might want to add latency lockdrops later; but no attempt to do so yet, going
by vma should itself reduce latency.

But what if is_hugepage_only_range? Those ia64 and ppc64 cases need careful
examination: put that off until a later patch of the series.

What of x86_64's 32bit vdso page __map_syscall32 maps outside any vma?

And the range to sparc64's flush_tlb_pgtables? It's less clear to me now that
we need to do more than is done here - every PMD_SIZE ever occupied will be
flushed, do we really have to flush every PGDIR_SIZE ever partially occupied?
A shame to complicate it unnecessarily.

Special thanks to David Miller for time spent repairing my ceilings.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-04-20 04:29:15 +0800

17 Apr, 2005

4 commits

323aca6c0 [PATCH] vmscan: pageout(): remove unneeded test ... Browse Code »

)

We only call pageout() for dirty pages, so this test is redundant.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

akpm@osdl.org
2005-04-17 06:24:06 +0800
79befd0c0 [PATCH] oom-killer disable for iscsi/lvm2/multipath userland critical sections ... Browse Code »

iscsi/lvm2/multipath needs guaranteed protection from the oom-killer, so
make the magical value of -17 in /proc//oom_adj defeat the oom-killer
altogether.

(akpm: we still need to document oom_adj and friends in
Documentation/filesystems/proc.txt!)

Signed-off-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2005-04-17 06:24:05 +0800
d34573426 [PATCH] filemap_getpage can block when MAP_NONBLOCK specified ... Browse Code »

We will return NULL from filemap_getpage when a page does not exist in the
page cache and MAP_NONBLOCK is specified, here:

page = find_get_page(mapping, pgoff);
if (!page) {
if (nonblock)
return NULL;
goto no_cached_page;
}

But we forget to do so when the page in the cache is not uptodate. The
following could result in a blocking call:

/*
* Ok, found a page in the page cache, now we need to check
* that it's up-to-date.
*/
if (!PageUptodate(page))
goto page_not_uptodate;

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Moyer
2005-04-17 06:24:05 +0800
1da177e4c Linux-2.6.12-rc2 ... Browse Code »

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

Linus Torvalds
2005-04-17 06:20:36 +0800