Eric Lee / smarc-fsl-linux-kernel

07 Jan, 2006

40 commits

3a291a20b [PATCH] mm: add a new function (needed for swap suspend) ... Browse Code »

This adds the function get_swap_page_of_type() allowing us to specify an index
in swap_info[] and select a swap_info_struct structure to be used for
allocating a swap page.

This function (or another one of similar functionality) will be necessary for
implementing the image-writing part of swsusp in the user space. It can also
be used for simplifying the current in-kernel implementation of the
image-writing part of swsusp.

Signed-off-by: Rafael J. Wysocki
Acked-by: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2006-01-07 00:33:43 +0800
c898ec16e [PATCH] allow flatmem to be disabled when only sparsemem is implemented ... Browse Code »

On architectures that implement sparsemem but not discontigmem we want to
be able to hide the flatmem option in some cases. On ppc64 for example,
when we select NUMA we must not select flatmem.

Signed-off-by: Anton Blanchard
Signed-off-by: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Anton Blanchard
2006-01-07 00:33:37 +0800
b0e15190e [PATCH] NOMMU: Make SYSV IPC SHM use ramfs facilities on NOMMU ... Browse Code »

The attached patch makes the SYSV IPC shared memory facilities use the new
ramfs facilities on a no-MMU kernel.

The following changes are made:

(1) There are now shmem_mmap() and shmem_get_unmapped_area() functions to
allow the IPC SHM facilities to commune with the tiny-shmem and shmem
code.

(2) ramfs files now need resizing using do_truncate() rather than by modifying
the inode size directly (see shmem_file_setup()). This causes ramfs to
attempt to bind a block of pages of sufficient size to the inode.

(3) CONFIG_SYSVIPC is no longer contingent on CONFIG_MMU.

Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-01-07 00:33:32 +0800
a74609faf [PATCH] mm: page_state opt ... Browse Code »

Optimise page_state manipulations by introducing interrupt unsafe accessors
to page_state fields. Callers must provide their own locking (either
disable interrupts or not update from interrupt context).

Switch over the hot callsites that can easily be moved under interrupts off
sections.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:29 +0800
070f80326 [PATCH] build_zonelists_node(): rename args ... Browse Code »

Give j and r meaningful names.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-01-07 00:33:28 +0800
02a68a5eb [PATCH] Fix zone policy determination ... Browse Code »

The use k in the inner loop means that the highest zone nr is always used
if any zone of a node is populated. This means that the policy zone is not
correctly determined on arches that do no use HIGHMEM like ia64.

Change the loop to decrement k which also simplifies the BUG_ON.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-01-07 00:33:28 +0800
4be38e351 [PATCH] mm: move determination of policy_zone into page allocator ... Browse Code »

Currently the function to build a zonelist for a BIND policy has the side
effect to set the policy_zone. This seems to be a bit strange. policy
zone seems to not be initialized elsewhere and therefore 0. Do we police
ZONE_DMA if no bind policy has been used yet?

This patch moves the determination of the zone to apply policies to into
the page allocator. We determine the zone while building the zonelist for
nodes.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-01-07 00:33:28 +0800
1a93205bd [PATCH] mm: simplify build_zonelists_node by removing the case statement. ... Browse Code »

Simplify build_zonelists_node by removing the case statement.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-01-07 00:33:28 +0800
f3fe65122 [PATCH] mm: add populated_zone() helper ... Browse Code »

There are numerous places we check whether a zone is populated or not.

Provide a helper function to check for populated zones and convert all
checks for zone->present_pages.

Signed-off-by: Con Kolivas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Con Kolivas
2006-01-07 00:33:28 +0800
80bfed904 [PATCH] consolidate lru_add_drain() and lru_drain_cache() ... Browse Code »

Cc: Christoph Lameter
Cc: Rajesh Shah
Cc: Li Shaohua
Cc: Srivatsa Vaddagiri
Cc: Ashok Raj
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-01-07 00:33:28 +0800
210fe5303 [PATCH] vmscan: balancing fix ... Browse Code »

Revert a patch which went into 2.6.8-rc1. The changelog for that patch was:

The shrink_zone() logic can, under some circumstances, cause far too many
pages to be reclaimed. Say, we're scanning at high priority and suddenly
hit a large number of reclaimable pages on the LRU.

Change things so we bale out when SWAP_CLUSTER_MAX pages have been
reclaimed.

Problem is, this change caused significant imbalance in inter-zone scan
balancing by truncating scans of larger zones.

Suppose, for example, ZONE_HIGHMEM is 10x the size of ZONE_NORMAL. The zone
balancing algorithm would require that if we're scanning 100 pages of
ZONE_HIGHMEM, we should scan 10 pages of ZONE_NORMAL. But this logic will
cause the scanning of ZONE_HIGHMEM to bale out after only 32 pages are
reclaimed. Thus effectively causing smaller zones to be scanned relatively
harder than large ones.

Now I need to remember what the workload was which caused me to write this
patch originally, then fix it up in a different way...

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-01-07 00:33:27 +0800
41e9b63b3 [PATCH] mm: pfault optimisation ... Browse Code »

This atomic operation is superfluous: the pte will be added with the
referenced bit set, and the page will be referenced through this mapping after
the page fault handler returns anyway.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:27 +0800
9617d95e6 [PATCH] mm: rmap optimisation ... Browse Code »

Optimise rmap functions by minimising atomic operations when we know there
will be no concurrent modifications.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:27 +0800
224abf92b [PATCH] mm: bad_page optimisation ... Browse Code »

Cut down size slightly by not passing bad_page the function name (it should be
able to be determined by dump_stack()). And cut down the number of printks in
bad_page.

Also, cut down some branching in the destroy_compound_page path.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:26 +0800
9328b8faa [PATCH] mm: dma32 zone statistics ... Browse Code »

Add dma32 to zone statistics. Also attempt to arrange struct page_state a
bit better (visually).

Signed-off-by: Nick Piggin
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:26 +0800
7756b9e4e [PATCH] kill last zone_reclaim() bits ... Browse Code »

Remove the last bits of Martin's ill-fated sys_set_zone_reclaim().

Cc: Martin Hicks
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-01-07 00:33:26 +0800
bbfbb7cec [PATCH] find_lock_page(): call __lock_page() directly. ... Browse Code »

As find_lock_page() already checks with TestSetPageLocked() that page is
locked, there is no need to call lock_page() that will try-lock page again
(chances of page being unlocked in between are small). Call __lock_page()
directly, this saves one atomic operation.

Also, mark truncate-while-slept path as unlikely while we are here.

(akpm: ug. But this is actually a common path for normal old read()s against
a page which is under readahead I/O so ho-hum.)

Signed-off-by: Nikita Danilov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nikita Danilov
2006-01-07 00:33:26 +0800
a226f6c89 [PATCH] FRV: Clean up bootmem allocator's page freeing algorithm ... Browse Code »

The attached patch cleans up the way the bootmem allocator frees pages.

A new function, __free_pages_bootmem(), is provided in mm/page_alloc.c that is
called from mm/bootmem.c to turn pages over to the main allocator. All the
bits of code to initialise pages (clearing PG_reserved and setting the page
count) are moved to here. The checks on page validity are removed, on the
assumption that the struct page arrays will have been prepared correctly.

Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-01-07 00:33:26 +0800
008857c1a [PATCH] Cleanup bootmem allocator and fix alloc_bootmem_low ... Browse Code »

Patch cleans up the alloc_bootmem fix for swiotlb. Patch removes
alloc_bootmem_*_limit api and fixes alloc_boot_*low api to do the right
thing -- allocate from low32 memory.

Signed-off-by: Ravikiran Thirumalai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ravikiran G Thirumalai
2006-01-07 00:33:26 +0800
085cc7d5d [PATCH] mm: page_alloc cleanups ... Browse Code »

Small cleanups that does not change generated code with the gcc's I've tested
with.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:25 +0800
a86b1f531 [PATCH] mm: page_state fixes ... Browse Code »

read_page_state and __get_page_state only traverse online CPUs, which will
cause results to fluctuate when CPUs are plugged in or out.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:25 +0800
2d92c5c91 [PATCH] mm: remove pcp low ... Browse Code »

struct per_cpu_pages.low is useless. Remove it.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:25 +0800
13e7444b0 [PATCH] mm: remove bad_range ... Browse Code »

bad_range is supposed to be a temporary check. It would be a pity to throw it
out. Make it depend on CONFIG_DEBUG_VM instead.

CONFIG_HOLES_IN_ZONE systems were relying on this to check pfn_valid in the
page allocator. Add that to page_is_buddy instead.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:25 +0800
92be2e33b [PATCH] mm: microopt conditions ... Browse Code »

Micro optimise some conditionals where we don't need lazy evaluation.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:25 +0800
77a8a7883 [PATCH] mm: set_page_refs opt ... Browse Code »

Inline set_page_refs.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:25 +0800
c54ad30c7 [PATCH] mm: pagealloc opt ... Browse Code »

Slightly optimise some page allocation and freeing functions by taking
advantage of knowing whether or not interrupts are disabled.

Signed-off-by: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2006-01-07 00:33:25 +0800
c484d4104 [PATCH] mm: free_pages_and_swap_cache opt ... Browse Code »

Minor optimization (though it doesn't help in the PREEMPT case, severely
constrained by small ZAP_BLOCK_SIZE). free_pages_and_swap_cache works in
chunks of 16, calling release_pages which works in chunks of PAGEVEC_SIZE.
But PAGEVEC_SIZE was dropped from 16 to 14 in 2.6.10, so we're now doing more
spin_lock_irq'ing than necessary: use PAGEVEC_SIZE throughout.

Signed-off-by: Hugh Dickins
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2006-01-07 00:33:24 +0800
a94b3ab7e [PATCH] mm: remove arch independent NODES_SPAN_OTHER_NODES ... Browse Code »

The NODES_SPAN_OTHER_NODES config option was created so that DISCONTIGMEM
could handle pSeries numa layouts. However, support for DISCONTIGMEM has
been replaced by SPARSEMEM on powerpc. As a result, this config option and
supporting code is no longer needed.

I have already sent a patch to Paul that removes the option from powerpc
specific code. This removes the arch independent piece. Doesn't really
matter which is applied first.

Signed-off-by: Mike Kravetz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Kravetz
2006-01-07 00:33:24 +0800
6bda666a0 [PATCH] hugepages: fold find_or_alloc_pages into huge_no_page() ... Browse Code »

The number of parameters for find_or_alloc_page increases significantly after
policy support is added to huge pages. Simplify the code by folding
find_or_alloc_huge_page() into hugetlb_no_page().

Adam Litke objected to this piece in an earlier patch but I think this is a
good simplification. Diffstat shows that we can get rid of almost half of the
lines of find_or_alloc_page(). If we can find no consensus then lets simply
drop this patch.

Signed-off-by: Christoph Lameter
Cc: Andi Kleen
Acked-by: William Lee Irwin III
Cc: Adam Litke
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-01-07 00:33:23 +0800
21abb1478 [PATCH] Remove old node based policy interface from mempolicy.c ... Browse Code »

mempolicy.c contains provisional interface for huge page allocation based on
node numbers. This is in use in SLES9 but was never used (AFAIK) in upstream
versions of Linux.

Huge page allocations now use zonelists to figure out where to allocate pages.
The use of zonelists allows us to find the closest hugepage which was the
consideration of the NUMA distance for huge page allocations.

Remove the obsolete functions.

Signed-off-by: Christoph Lameter
Cc: Andi Kleen
Acked-by: William Lee Irwin III
Cc: Adam Litke
Acked-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-01-07 00:33:23 +0800
5da7ca860 [PATCH] Add NUMA policy support for huge pages. ... Browse Code »

The huge_zonelist() function in the memory policy layer provides an list of
zones ordered by NUMA distance. The hugetlb layer will walk that list looking
for a zone that has available huge pages but is also in the nodeset of the
current cpuset.

This patch does not contain the folding of find_or_alloc_huge_page() that was
controversial in the earlier discussion.

Signed-off-by: Christoph Lameter
Cc: Andi Kleen
Acked-by: William Lee Irwin III
Cc: Adam Litke
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-01-07 00:33:23 +0800
96df9333c [PATCH] mm: dequeue a huge page near to this node ... Browse Code »

This was discussed at
http://marc.theaimsgroup.com/?l=linux-kernel&m=113166526217117&w=2

This patch changes the dequeueing to select a huge page near the node
executing instead of always beginning to check for free nodes from node 0.
This will result in a placement of the huge pages near the executing
processor improving performance.

The existing implementation can place the huge pages far away from the
executing processor causing significant degradation of performance. The
search starting from zero also means that the lower zones quickly run out
of memory. Selecting a huge page near the process distributed the huge
pages better.

Signed-off-by: Christoph Lameter
Cc: William Lee Irwin III
Cc: Adam Litke
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-01-07 00:33:23 +0800
1e8f889b1 [PATCH] Hugetlb: Copy on Write support ... Browse Code »

Implement copy-on-write support for hugetlb mappings so MAP_PRIVATE can be
supported. This helps us to safely use hugetlb pages in many more
applications. The patch makes the following changes. If needed, I also have
it broken out according to the following paragraphs.

1. Add a pair of functions to set/clear write access on huge ptes. The
writable check in make_huge_pte is moved out to the caller for use by COW
later.

2. Hugetlb copy-on-write requires special case handling in the following
situations:

- copy_hugetlb_page_range() - Copied pages must be write protected so
a COW fault will be triggered (if necessary) if those pages are written
to.

- find_or_alloc_huge_page() - Only MAP_SHARED pages are added to the
page cache. MAP_PRIVATE pages still need to be locked however.

3. Provide hugetlb_cow() and calls from hugetlb_fault() and
hugetlb_no_page() which handles the COW fault by making the actual copy.

4. Remove the check in hugetlbfs_file_map() so that MAP_PRIVATE mmaps
will be allowed. Make MAP_HUGETLB exempt from the depricated VM_RESERVED
mapping check.

Signed-off-by: David Gibson
Signed-off-by: Adam Litke
Cc: William Lee Irwin III
Cc: "Seth, Rohit"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Gibson
2006-01-07 00:33:23 +0800
86e5216f8 [PATCH] Hugetlb: Reorganize hugetlb_fault to prepare for COW ... Browse Code »

This patch splits the "no_page()" type activity into its own function,
hugetlb_no_page(). hugetlb_fault() becomes the entry point for hugetlb faults
and delegates to the appropriate handler depending on the type of fault.
Right now we still have only hugetlb_no_page() but a later patch introduces a
COW fault.

Signed-off-by: David Gibson
Signed-off-by: Adam Litke
Cc: William Lee Irwin III
Cc: "Seth, Rohit"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adam Litke
2006-01-07 00:33:22 +0800
85ef47f74 [PATCH] Hugetlb: Rename find_lock_page to find_or_alloc_huge_page ... Browse Code »

find_lock_huge_page() isn't a great name, since it does extra things not
analagous to find_lock_page(). Rename it find_or_alloc_huge_page() which is
closer to the mark.

Signed-off-by: David Gibson
Signed-off-by: Adam Litke
Cc: William Lee Irwin III
Cc: "Seth, Rohit"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adam Litke
2006-01-07 00:33:22 +0800
f0916794f [PATCH] Hugetlb: Remove duplicate i_size check ... Browse Code »

cleanup

Signed-off-by: David Gibson
Signed-off-by: Adam Litke
Cc: William Lee Irwin III
Cc: "Seth, Rohit"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adam Litke
2006-01-07 00:33:22 +0800
f6b3ec238 [PATCH] madvise(MADV_REMOVE): remove pages from tmpfs shm backing store ... Browse Code »

Here is the patch to implement madvise(MADV_REMOVE) - which frees up a
given range of pages & its associated backing store. Current
implementation supports only shmfs/tmpfs and other filesystems return
-ENOSYS.

"Some app allocates large tmpfs files, then when some task quits and some
client disconnect, some memory can be released. However the only way to
release tmpfs-swap is to MADV_REMOVE". - Andrea Arcangeli

Databases want to use this feature to drop a section of their bufferpool
(shared memory segments) - without writing back to disk/swap space.

This feature is also useful for supporting hot-plug memory on UML.

Concerns raised by Andrew Morton:

- "We have no plan for holepunching! If we _do_ have such a plan (or
might in the future) then what would the API look like? I think
sys_holepunch(fd, start, len), so we should start out with that."

- Using madvise is very weird, because people will ask "why do I need to
mmap my file before I can stick a hole in it?"

- None of the other madvise operations call into the filesystem in this
manner. A broad question is: is this capability an MM operation or a
filesytem operation? truncate, for example, is a filesystem operation
which sometimes has MM side-effects. madvise is an mm operation and with
this patch, it gains FS side-effects, only they're really, really
significant ones."

Comments:

- Andrea suggested the fs operation too but then it's more efficient to
have it as a mm operation with fs side effects, because they don't
immediatly know fd and physical offset of the range. It's possible to
fixup in userland and to use the fs operation but it's more expensive,
the vmas are already in the kernel and we can use them.

Short term plan & Future Direction:

- We seem to need this interface only for shmfs/tmpfs files in the short
term. We have to add hooks into the filesystem for correctness and
completeness. This is what this patch does.

- In the future, plan is to support both fs and mmap apis also. This
also involves (other) filesystem specific functions to be implemented.

- Current patch doesn't support VM_NONLINEAR - which can be addressed in
the future.

Signed-off-by: Badari Pulavarty
Cc: Hugh Dickins
Cc: Andrea Arcangeli
Cc: Michael Kerrisk
Cc: Ulrich Drepper
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Badari Pulavarty
2006-01-07 00:33:22 +0800
d7339071f [PATCH] reiser4: vfs: add truncate_inode_pages_range() ... Browse Code »

This patch makes truncate_inode_pages_range from truncate_inode_pages.
truncate_inode_pages became a one-liner call to truncate_inode_pages_range.

Reiser4 needs truncate_inode_pages_ranges because it tries to keep
correspondence between existences of metadata pointing to data pages and pages
to which those metadata point to. So, when metadata of certain part of file
is removed from filesystem tree, only pages of corresponding range are to be
truncated.

(Needed by the madvise(MADV_REMOVE) patch)

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hans Reiser
2006-01-07 00:33:22 +0800
5ac24eefd [PATCH] memhotplug: __add_section remove unused pgdat definition ... Browse Code »

__add_section defines an unused pointer to the zones pgdat. Remove this
definition. This fixes a compile warning.

Signed-off-by: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Whitcroft
2006-01-07 00:33:21 +0800
47f3a867f [PATCH] mm: fix __alloc_pages cpuset ALLOC_* flags ... Browse Code »

Two changes to the setting of the ALLOC_CPUSET flag in
mm/page_alloc.c:__alloc_pages()

- A bug fix - the "ignoring mins" case should not be honoring ALLOC_CPUSET.
This case of all cases, since it is handling a request that will free up
more memory than is asked for (exiting tasks, e.g.) should be allowed to
escape cpuset constraints when memory is tight.

- A logic change to make it simpler. Honor cpusets even on GFP_ATOMIC
(!wait) requests. With this, cpuset confinement applies to all requests
except ALLOC_NO_WATERMARKS, so that in a subsequent cleanup patch, I can
remove the ALLOC_CPUSET flag entirely. Since I don't know any real reason
this logic has to be either way, I am choosing the path of the simplest
code.

Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2006-01-07 00:33:21 +0800