Eric Lee / smarc-fsl-linux-kernel

25 Jan, 2008

3 commits

88c3f7a8f Security: remove security_file_mmap hook sparse-warnings (NULL as 0). ... Browse Code »

Fixing:
CHECK mm/mmap.c
mm/mmap.c:1623:29: warning: Using plain integer as NULL pointer
mm/mmap.c:1623:29: warning: Using plain integer as NULL pointer
mm/mmap.c:1944:29: warning: Using plain integer as NULL pointer

Signed-off-by: Richard Knutsson
Signed-off-by: James Morris

Richard Knutsson
2008-01-25 08:29:48 +0800
9c09a95cf slab: partially revert list3 changes ... Browse Code »

Partial revert the changes made by 04231b3002ac53f8a64a7bd142fde3fa4b6808c6
to the kmem_list3 management. On a machine with a memoryless node, this
BUG_ON was triggering

static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
{
struct list_head *entry;
struct slab *slabp;
struct kmem_list3 *l3;
void *obj;
int x;

l3 = cachep->nodelists[nodeid];
BUG_ON(!l3);

Signed-off-by: Mel Gorman
Cc: Pekka Enberg
Acked-by: Christoph Lameter
Cc: "Aneesh Kumar K.V"
Cc: Nishanth Aravamudan
Cc: KAMEZAWA Hiroyuki
Cc: Paul Mackerras
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2008-01-25 00:07:27 +0800
c5c99429f fix hugepages leak due to pagetable page sharing ... Browse Code »

The shared page table code for hugetlb memory on x86 and x86_64
is causing a leak. When a user of hugepages exits using this code
the system leaks some of the hugepages.

-------------------------------------------------------
Part of /proc/meminfo just before database startup:
HugePages_Total: 5500
HugePages_Free: 5500
HugePages_Rsvd: 0
Hugepagesize: 2048 kB

Just before shutdown:
HugePages_Total: 5500
HugePages_Free: 4475
HugePages_Rsvd: 0
Hugepagesize: 2048 kB

After shutdown:
HugePages_Total: 5500
HugePages_Free: 4988
HugePages_Rsvd:
0 Hugepagesize: 2048 kB
----------------------------------------------------------

The problem occurs durring a fork, in copy_hugetlb_page_range(). It
locates the dst_pte using huge_pte_alloc(). Since huge_pte_alloc() calls
huge_pmd_share() it will share the pmd page if can, yet the main loop in
copy_hugetlb_page_range() does a get_page() on every hugepage. This is a
violation of the shared hugepmd pagetable protocol and creates additional
referenced to the hugepages causing a leak when the unmap of the VMA
occurs. We can skip the entire replication of the ptes when the hugepage
pagetables are shared. The attached patch skips copying the ptes and the
get_page() calls if the hugetlbpage pagetable is shared.

[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Larry Woodman
Signed-off-by: Adam Litke
Cc: Badari Pulavarty
Cc: Ken Chen
Cc: David Gibson
Cc: William Lee Irwin III
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Larry Woodman
2008-01-25 00:07:27 +0800

24 Jan, 2008

1 commit

8f7b3d156 Update ctime and mtime for memory-mapped files ... Browse Code »

Update ctime and mtime for memory-mapped files at a write access on
a present, read-only PTE, as well as at a write on a non-present PTE.

Signed-off-by: Anton Salikhmetov
Signed-off-by: Linus Torvalds

Anton Salikhmetov
2008-01-24 01:58:55 +0800

18 Jan, 2008

2 commits

9723198c2 #ifdef very expensive debug check in page fault path ... Browse Code »

This patch puts #ifdef CONFIG_DEBUG_VM around a check in vm_normal_page
that verifies that a pfn is valid. This patch increases performance of the
page fault microbenchmark in lmbench by 13% and overall dbench performance
by 7% on s390x. pfn_valid() is an expensive operation on s390 that needs a
high double digit amount of CPU cycles. Nick Piggin suggested that
pfn_valid() involves an array lookup on systems with sparsemem, and
therefore is an expensive operation there too.

The check looks like a clear debug thing to me, it should never trigger on
regular kernels. And if a pte is created for an invalid pfn, we'll find
out once the memory gets accessed later on anyway. Please consider
inclusion of this patch into mm.

Signed-off-by: Carsten Otte
Acked-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Carsten Otte
2008-01-18 07:38:59 +0800
1d6f4e60e mm: fix section mismatch warning in page_alloc.c ... Browse Code »

With CONFIG_HOTPLUG=n and CONFIG_HOTPLUG_CPU=y we saw
following warning:
WARNING: mm/built-in.o(.text+0x6864): Section mismatch: reference to .init.text: (between 'process_zones' and 'pageset_cpuup_callback')

The culprit was zone_batchsize() which were annotated __devinit but used
from process_zones() which is annotated __cpuinit. zone_batchsize() are
used from another function annotated __meminit so the only valid option is
to drop the annotation of zone_batchsize() so we know it is always valid to
use it.

Signed-off-by: Sam Ravnborg
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sam Ravnborg
2008-01-18 07:38:58 +0800

15 Jan, 2008

3 commits

c23f72cae Revert "writeback: introduce writeback_control.more_io to indicate more io" ... Browse Code »

This reverts commit 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b, as
requested by Fengguang Wu. It's not quite fully baked yet, and while
there are patches around to fix the problems it caused, they should get
more testing. Says Fengguang: "I'll resend them both for -mm later on,
in a more complete patchset".

See

http://bugzilla.kernel.org/show_bug.cgi?id=9738

for some of this discussion.

Requested-by: Fengguang Wu
Cc: Andrew Morton
Cc: Peter Zijlstra
Signed-off-by: Linus Torvalds

Linus Torvalds
2008-01-15 13:21:29 +0800
68842c9b9 hugetlbfs: fix quota leak ... Browse Code »

In the error path of both shared and private hugetlb page allocation,
the file system quota is never undone, leading to fs quota leak. Fix
them up.

[akpm@linux-foundation.org: cleanup, micro-optimise]
Signed-off-by: Ken Chen
Acked-by: Adam Litke
Cc: David Gibson
Cc: William Lee Irwin III
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ken Chen
2008-01-15 00:52:23 +0800
96990a4ae quicklists: Only consider memory that can be used with GFP_KERNEL ... Browse Code »

Quicklists calculates the size of the quicklists based on the number of
free pages. This must be the number of free pages that can be allocated
with GFP_KERNEL. node_page_state() includes the pages in ZONE_HIGHMEM and
ZONE_MOVABLE which may lead the quicklists to become too large causing OOM.

Signed-off-by: Christoph Lameter
Tested-by: Dhaval Giani
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2008-01-15 00:52:22 +0800

09 Jan, 2008

2 commits

467bc461d Fix crash with FLAT_MEMORY and ARCH_PFN_OFFSET != 0 ... Browse Code »

When using FLAT_MEMORY and ARCH_PFN_OFFSET is not 0, the kernel crashes in
memmap_init_zone(). This bug got introduced by commit
c713216deebd95d2b0ab38fef8bb2361c0180c2d

Signed-off-by: Thomas Bogendoerfer
Acked-by: Mel Gorman
Cc: Bob Picco
Cc: Dave Hansen
Cc: Andy Whitcroft
Cc: Andi Kleen
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: "Keith Mannthey"
Cc: "Luck, Tony"
Cc: KAMEZAWA Hiroyuki
Cc: Yasunori Goto
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Bogendoerfer
2008-01-09 08:10:36 +0800
c51b1a160 xip: fix get_zeroed_page with __GFP_HIGHMEM ... Browse Code »

The use of get_zeroed_page() with __GFP_HIGHMEM is invalid. Use
alloc_page() with __GFP_ZERO instead of invalid get_zeroed_page().

(This patch is only compile tested)

Cc: Carsten Otte
Signed-off-by: Akinobu Mita
Acked-by: Hugh Dickins
Acked-by: Carsten Otte
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2008-01-09 08:10:36 +0800

03 Jan, 2008

1 commit

158a96242 Unify /proc/slabinfo configuration ... Browse Code »

Both SLUB and SLAB really did almost exactly the same thing for
/proc/slabinfo setup, using duplicate code and per-allocator #ifdef's.

This just creates a common CONFIG_SLABINFO that is enabled by both SLUB
and SLAB, and shares all the setup code. Maybe SLOB will want this some
day too.

Reviewed-by: Pekka Enberg
Signed-off-by: Linus Torvalds

Linus Torvalds
2008-01-03 05:04:48 +0800

02 Jan, 2008

1 commit

57ed3eda9 slub: provide /proc/slabinfo ... Browse Code »

This adds a read-only /proc/slabinfo file on SLUB, that makes slabtop work.

[ mingo@elte.hu: build fix. ]

Cc: Andi Kleen
Cc: Christoph Lameter
Cc: Peter Zijlstra
Signed-off-by: Pekka Enberg
Signed-off-by: Ingo Molnar
Signed-off-by: Linus Torvalds

Pekka J Enberg
2008-01-02 03:32:02 +0800

22 Dec, 2007

1 commit

76be89500 SLUB: Improve hackbench speed ... Browse Code »

Increase the mininum number of partial slabs to keep around and put
partial slabs to the end of the partial queue so that they can add
more objects.

Signed-off-by: Christoph Lameter
Reviewed-by: Pekka Enberg
Acked-by: Ingo Molnar
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-12-22 07:51:07 +0800

20 Dec, 2007

1 commit

3a6927906 Do dirty page accounting when removing a page from the page cache ... Browse Code »

Krzysztof Oledzki noticed a dirty page accounting leak on some of his
machines, causing the machine to eventually lock up when the kernel
decided that there was too much dirty data, but nobody could actually
write anything out to fix it.

The culprit turns out to be filesystems (cough ext3 with data=journal
cough) that re-dirty the page when the "->invalidatepage()" callback is
called.

Fix it up by doing a final dirty page accounting check when we actually
remove the page from the page cache.

This fixes bugzilla entry 9182:

http://bugzilla.kernel.org/show_bug.cgi?id=9182

Tested-by: Ingo Molnar
Tested-by: Krzysztof Oledzki
Cc: Andrew Morton
Cc: Nick Piggin
Cc: Peter Zijlstra
Signed-off-by: Linus Torvalds

Linus Torvalds
2007-12-20 06:05:13 +0800

18 Dec, 2007

7 commits

3811dbf67 SLUB: remove useless masking of GFP_ZERO ... Browse Code »

Remove a recently added useless masking of GFP_ZERO. GFP_ZERO is already
masked out in new_slab() (See how it calls allocate_slab). No need to do
it twice.

This reverts the SLUB parts of 7fd272550bd43cc1d7289ef0ab2fa50de137e767.

Cc: Matt Mackall
Reviewed-by: Pekka Enberg
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2007-12-18 11:28:17 +0800
368d2c635 Revert "hugetlb: Add hugetlb_dynamic_pool sysctl" ... Browse Code »

This reverts commit 54f9f80d6543fb7b157d3b11e2e7911dc1379790 ("hugetlb:
Add hugetlb_dynamic_pool sysctl")

Given the new sysctl nr_overcommit_hugepages, the boolean dynamic pool
sysctl is not needed, as its semantics can be expressed by 0 in the
overcommit sysctl (no dynamic pool) and non-0 in the overcommit sysctl
(pool enabled).

(Needed in 2.6.24 since it reverts a post-2.6.23 userspace-visible change)

Signed-off-by: Nishanth Aravamudan
Acked-by: Adam Litke
Cc: William Lee Irwin III
Cc: Dave Hansen
Cc: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nishanth Aravamudan
2007-12-18 11:28:17 +0800
d1c3fb1f8 hugetlb: introduce nr_overcommit_hugepages sysctl ... Browse Code »

hugetlb: introduce nr_overcommit_hugepages sysctl

While examining the code to support /proc/sys/vm/hugetlb_dynamic_pool, I
became convinced that having a boolean sysctl was insufficient:

1) To support per-node control of hugepages, I have previously submitted
patches to add a sysfs attribute related to nr_hugepages. However, with
a boolean global value and per-mount quota enforcement constraining the
dynamic pool, adding corresponding control of the dynamic pool on a
per-node basis seems inconsistent to me.

2) Administration of the hugetlb dynamic pool with multiple hugetlbfs
mount points is, arguably, more arduous than it needs to be. Each quota
would need to be set separately, and the sum would need to be monitored.

To ease the administration, and to help make the way for per-node
control of the static & dynamic hugepage pool, I added a separate
sysctl, nr_overcommit_hugepages. This value serves as a high watermark
for the overall hugepage pool, while nr_hugepages serves as a low
watermark. The boolean sysctl can then be removed, as the condition

nr_overcommit_hugepages > 0

indicates the same administrative setting as

hugetlb_dynamic_pool == 1

Quotas still serve as local enforcement of the size of the pool on a
per-mount basis.

A few caveats:

1) There is a race whereby the global surplus huge page counter is
incremented before a hugepage has allocated. Another process could then
try grow the pool, and fail to convert a surplus huge page to a normal
huge page and instead allocate a fresh huge page. I believe this is
benign, as no memory is leaked (the actual pages are still tracked
correctly) and the counters won't go out of sync.

2) Shrinking the static pool while a surplus is in effect will allow the
number of surplus huge pages to exceed the overcommit value. As long as
this condition holds, however, no more surplus huge pages will be
allowed on the system until one of the two sysctls are increased
sufficiently, or the surplus huge pages go out of use and are freed.

Successfully tested on x86_64 with the current libhugetlbfs snapshot,
modified to use the new sysctl.

Signed-off-by: Nishanth Aravamudan
Acked-by: Adam Litke
Cc: William Lee Irwin III
Cc: Dave Hansen
Cc: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nishanth Aravamudan
2007-12-18 11:28:17 +0800
81eabcbe0 mm: fix page allocation for larger I/O segments ... Browse Code »

In some cases the IO subsystem is able to merge requests if the pages are
adjacent in physical memory. This was achieved in the allocator by having
expand() return pages in physically contiguous order in situations were a
large buddy was split. However, list-based anti-fragmentation changed the
order pages were returned in to avoid searching in buffered_rmqueue() for a
page of the appropriate migrate type.

This patch restores behaviour of rmqueue_bulk() preserving the physical
order of pages returned by the allocator without incurring increased search
costs for anti-fragmentation.

Signed-off-by: Mel Gorman
Cc: James Bottomley
Cc: Jens Axboe
Cc: Mark Lord
Signed-off-by: Linus Torvalds

Mel Gorman
2007-12-18 11:28:16 +0800
bbd068259 mm/sparse.c: improve the error handling for sparse_add_one_section() ... Browse Code »

Improve the error handling for mm/sparse.c::sparse_add_one_section(). And I
see no reason to check 'usemap' until holding the 'pgdat_resize_lock'.

[geoffrey.levand@am.sony.com: sparse_index_init() returns -EEXIST]
Cc: Christoph Lameter
Acked-by: Dave Hansen
Cc: Rik van Riel
Acked-by: Yasunori Goto
Cc: Andy Whitcroft
Signed-off-by: WANG Cong
Signed-off-by: Geoff Levand
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

WANG Cong
2007-12-18 11:28:16 +0800
af0cd5a7c mm/sparse.c: check the return value of sparse_index_alloc() ... Browse Code »

Since sparse_index_alloc() can return NULL on memory allocation failure,
we must deal with the failure condition when calling it.

Signed-off-by: WANG Cong
Cc: Christoph Lameter
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

WANG Cong
2007-12-18 11:28:16 +0800
a5ee6daa5 sparsemem: make SPARSEMEM_VMEMMAP selectable ... Browse Code »

SPARSEMEM_VMEMMAP needs to be a selectable config option to support
building the kernel both with and without sparsemem vmemmap support. This
selection is desirable for platforms which could be configured one way for
platform specific builds and the other for multi-platform builds.

Signed-off-by: Miguel Botón
Signed-off-by: Geoff Levand
Acked-by: Yasunori Goto
Cc: Christoph Lameter
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Geoff Levand
2007-12-18 11:28:16 +0800

11 Dec, 2007

1 commit

72fad7139 hugetlb: handle write-protection faults in follow_hugetlb_page ... Browse Code »

The follow_hugetlb_page() fix I posted (merged as git commit
5b23dbe8173c212d6a326e35347b038705603d39) missed one case. If the pte is
present, but not writable and write access is requested by the caller to
get_user_pages(), the code will do the wrong thing. Rather than calling
hugetlb_fault to make the pte writable, it notes the presence of the pte
and continues.

This simple one-liner makes sure we also fault on the pte for this case.
Please apply.

Signed-off-by: Adam Litke
Acked-by: Dave Kleikamp
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adam Litke
2007-12-11 11:43:55 +0800

10 Dec, 2007

1 commit

7fd272550 Avoid double memclear() in SLOB/SLUB ... Browse Code »

Both slob and slub react to __GFP_ZERO by clearing the allocation, which
means that passing the GFP_ZERO bit down to the page allocator is just
wasteful and pointless.

Acked-by: Matt Mackall
Reviewed-by: Pekka Enberg
Signed-off-by: Linus Torvalds

Linus Torvalds
2007-12-10 02:17:52 +0800

06 Dec, 2007

6 commits

ad658cec2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
VM/Security: add security hook to do_brk
Security: round mmap hint address above mmap_min_addr
security: protect from stack expantion into low vm addresses
Security: allow capable check to permit mmap or low vm space
SELinux: detect dead booleans
SELinux: do not clear f_op when removing entries

Linus Torvalds
2007-12-06 01:26:52 +0800
ecaf18c15 VM/Security: add security hook to do_brk ... Browse Code »

Given a specifically crafted binary do_brk() can be used to get low pages
available in userspace virtual memory and can thus be used to circumvent
the mmap_min_addr low memory protection. Add security checks in do_brk().

Signed-off-by: Eric Paris
Acked-by: Alan Cox
Cc: Stephen Smalley
Cc: James Morris
Cc: Chris Wright
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Paris
2007-12-06 01:21:21 +0800
294a80a8e SLUB's ksize() fails for size > 2048 ... Browse Code »

I can't pass memory allocated by kmalloc() to ksize() if it is allocated by
SLUB allocator and size is larger than (I guess) PAGE_SIZE / 2.

The error of ksize() seems to be that it does not check if the allocation
was made by SLUB or the page allocator.

Reviewed-by: Pekka Enberg
Tested-by: Tetsuo Handa
Cc: Christoph Lameter , Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vegard Nossum
2007-12-06 01:21:20 +0800
369b8f5a7 mm: fix XIP file writes ... Browse Code »

Writing to XIP files at a non-page-aligned offset results in data corruption
because the writes were always sent to the start of the page.

Signed-off-by: Nick Piggin
Cc: Christian Borntraeger
Acked-by: Carsten Otte
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-12-06 01:21:20 +0800
f8fcc9331 Add EXPORT_SYMBOL(ksize); ... Browse Code »

mm/slub.c exports ksize(), but mm/slob.c and mm/slab.c don't.

It's used by binfmt_flat, which can be built as a module.

Signed-off-by: Tetsuo Handa
Cc: Christoph Lameter
Cc: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tetsuo Handa
2007-12-06 01:21:18 +0800
4b01a0b16 mm/backing-dev.c: fix percpu_counter_destroy call bug in bdi_init ... Browse Code »

this call should use the array index j, not i. But with this approach, just
one int i is enough, int j is not needed.

Signed-off-by: Denis Cheng
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Denis Cheng
2007-12-06 01:21:18 +0800

05 Dec, 2007

3 commits

5a211a5de VM/Security: add security hook to do_brk ... Browse Code »

Given a specifically crafted binary do_brk() can be used to get low
pages available in userspace virtually memory and can thus be used to
circumvent the mmap_min_addr low memory protection. Add security checks
in do_brk().

Signed-off-by: Eric Paris
Acked-by: Alan Cox
Signed-off-by: James Morris

Eric Paris
2007-12-05 21:25:30 +0800
7cd94146c Security: round mmap hint address above mmap_min_addr ... Browse Code »

If mmap_min_addr is set and a process attempts to mmap (not fixed) with a
non-null hint address less than mmap_min_addr the mapping will fail the
security checks. Since this is just a hint address this patch will round
such a hint address above mmap_min_addr.

gcj was found to try to be very frugal with vm usage and give hint addresses
in the 8k-32k range. Without this patch all such programs failed and with
the patch they happily get a higher address.

This patch is wrappad in CONFIG_SECURITY since mmap_min_addr doesn't exist
without it and there would be no security check possible no matter what. So
we should not bother compiling in this rounding if it is just a waste of
time.

Signed-off-by: Eric Paris
Signed-off-by: James Morris

Eric Paris
2007-12-05 21:25:10 +0800
8869477a4 security: protect from stack expantion into low vm addresses ... Browse Code »

Add security checks to make sure we are not attempting to expand the
stack into memory protected by mmap_min_addr

Signed-off-by: Eric Paris
Signed-off-by: James Morris

Eric Paris
2007-12-05 21:24:48 +0800

01 Dec, 2007

1 commit

80cbd911c Fix kmem_cache_free performance regression in slab ... Browse Code »

The database performance group have found that half the cycles spent
in kmem_cache_free are spent in this one call to BUG_ON. Moving it
into the CONFIG_SLAB_DEBUG-only function cache_free_debugcheck() is a
performance win of almost 0.5% on their particular benchmark.

The call was added as part of commit ddc2e812d592457747c4367fb73edcaa8e1e49ff
with the comment that "overhead should be minimal". It may have been
minimal at the time, but it isn't now.

[ Quoth Pekka Enberg: "I don't think the BUG_ON per se caused the
performance regression but rather the virt_to_head_page() changes to
virt_to_cache() that were added later." ]

Signed-off-by: Matthew Wilcox
Acked-by: Pekka J Enberg
Signed-off-by: Linus Torvalds

Matthew Wilcox
2007-12-01 00:08:05 +0800

30 Nov, 2007

2 commits

e0dc3a53d memory hotplug fix: fix section mismatch in vmammap_allock_block() ... Browse Code »

Fixes section mismatch below.

WARNING: vmlinux.o(.text+0x946b5): Section mismatch: reference to .init.text:'
__alloc_bootmem_node (between 'vmemmap_alloc_block' and 'vmemmap_pgd_populate')

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Kamalesh Babulal
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2007-11-30 01:24:54 +0800
ba72cb8cb Fix boot problem with iSeries lacking hugepage support ... Browse Code »

Ordinarily the size of a pageblock is determined at compile-time based on the
hugepage size. On PPC64, the hugepage size is determined at runtime based on
what is supported by the machine. With legacy machines such as iSeries that
do not support hugepages, HPAGE_SHIFT is 0. This results in pageblock_order
being set to -PAGE_SHIFT and a crash results shortly afterwards.

This patch adds a function to select a sensible value for pageblock order by
default when HUGETLB_PAGE_SIZE_VARIABLE is set. It checks that HPAGE_SHIFT
is a sensible value before using the hugepage size; if it is not MAX_ORDER-1
is used.

This is a fix for 2.6.24.

Credit goes to Stephen Rothwell for identifying the bug and testing candidate
patches. Additional credit goes to Andy Whitcroft for spotting a problem
with respects to IA-64 before releasing. Additional credit to David Gibson
for testing with the libhugetlbfs test suite.

Signed-off-by: Mel Gorman
Tested-by: Stephen Rothwell
Cc: Benjamin Herrenschmidt
Acked-by: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2007-11-30 01:24:51 +0800

29 Nov, 2007

2 commits

09f345da7 prep_zero_page: remove bogus BUG_ON ... Browse Code »

2.6.11 gave __GFP_ZERO's prep_zero_page a bogus "highmem may have to wait"
assertion. Presumably added under the misconception that clear_highpage
uses nonatomic kmap; but then and now it uses kmap_atomic, so no problem.

Signed-off-by: Hugh Dickins
Signed-off-by: Linus Torvalds

Hugh Dickins
2007-11-29 03:04:28 +0800
e84e2e132 tmpfs: restore missing clear_highpage ... Browse Code »

tmpfs was misconverted to __GFP_ZERO in 2.6.11. There's an unusual case in
which shmem_getpage receives the page from its caller instead of allocating.
We must cover this case by clear_highpage before SetPageUptodate, as before.

Signed-off-by: Hugh Dickins
Signed-off-by: Linus Torvalds

Hugh Dickins
2007-11-29 03:04:28 +0800

20 Nov, 2007

1 commit

ce7e9fae8 [S390] Optimize storage key handling for anonymous pages ... Browse Code »

page_mkclean used to call page_clear_dirty for every given page. This
is different to all other architectures, where the dirty bit in the
PTEs is only resetted, if page_mapping() returns a non-NULL pointer.
We can move the page_test_dirty/page_clear_dirty sequence into the
2nd if to avoid unnecessary iske/sske sequences, which are expensive.

This change also helps kvm for s390 as the host must transfer the
dirty bit into the guest status bits. By moving the page_clear_dirty
operation into the 2nd if, the vm will only call page_clear_dirty
for pages where it walks the mapping anyway. There it calls
ptep_clear_flush for writable ptes, so we can transfer the dirty bit
to the guest.

Signed-off-by: Christian Borntraeger
Signed-off-by: Martin Schwidefsky

Christian Borntraeger
2007-11-20 18:13:46 +0800

16 Nov, 2007

1 commit

8c0863403 dirty page balancing: Get rid of broken unmapped_ratio logic ... Browse Code »

This code harks back to the days when we didn't count dirty mapped
pages, which led us to try to balance the number of dirty unmapped pages
by how much unmapped memory there was in the system.

That makes no sense any more, since now the dirty counts include the
mapped pages. Not to mention that the math doesn't work with HIGHMEM
machines anyway, and causes the unmapped_ratio to potentially turn
negative (which we do catch thanks to clamping it at a minimum value,
but I mention that as an indication of how broken the code is).

The code also was written at a time when the default dirty ratio was
much larger, and the unmapped_ratio logic effectively capped that large
dirty ratio a bit. Again, we've since lowered the dirty ratio rather
aggressively, further lessening the point of that code.

Acked-by: Peter Zijlstra
Signed-off-by: Linus Torvalds

Linus Torvalds
2007-11-16 08:41:52 +0800