Eric Lee / smarc-fsl-linux-kernel

23 Mar, 2011

1 commit

c033a93c0 hugetlbfs: correct handling of negative input to /proc/sys/vm/nr_hugepages ... Browse Code »

When the user inserts a negative value into /proc/sys/vm/nr_hugepages it
will cause the kernel to allocate as many hugepages as possible and to
then update /proc/meminfo to reflect this.

This changes the behavior so that the negative input will result in
nr_hugepages value being unchanged.

Signed-off-by: Petr Holasek
Signed-off-by: Anton Arapov
Reviewed-by: Naoya Horiguchi
Acked-by: David Rientjes
Acked-by: Mel Gorman
Acked-by: Eric B Munson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Holasek
2011-03-23 08:44:04 +0800

14 Jan, 2011

5 commits

73ae31e59 hugetlb: fix handling of parse errors in sysfs ... Browse Code »

When parsing changes to the huge page pool sizes made from userspace via
the sysfs interface, bogus input values are being covered up by
nr_hugepages_store_common and nr_overcommit_hugepages_store returning 0
when strict_strtoul returns an error. This can cause an infinite loop in
the nr_hugepages_store code. This patch changes the return value for
these functions to -EINVAL when strict_strtoul returns an error.

Signed-off-by: Eric B Munson
Reported-by: CAI Qian
Cc: Andrea Arcangeli
Cc: Eric B Munson
Cc: Michal Hocko
Cc: Nishanth Aravamudan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric B Munson
2011-01-14 09:32:49 +0800
adbe8726d hugetlb: do not allow pagesize >= MAX_ORDER pool adjustment ... Browse Code »

Huge pages with order >= MAX_ORDER must be allocated at boot via the
kernel command line, they cannot be allocated or freed once the kernel is
up and running. Currently we allow values to be written to the sysfs and
sysctl files controling pool size for these huge page sizes. This patch
makes the store functions for nr_hugepages and nr_overcommit_hugepages
return -EINVAL when the pool for a page size >= MAX_ORDER is changed.

[akpm@linux-foundation.org: avoid multiple return paths in nr_hugepages_store_common()]
[caiqian@redhat.com: add checking in hugetlb_overcommit_handler()]
Signed-off-by: Eric B Munson
Reported-by: CAI Qian
Cc: Andrea Arcangeli
Cc: Michal Hocko
Cc: Nishanth Aravamudan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric B Munson
2011-01-14 09:32:49 +0800
08d4a2465 hugetlb: check the return value of string conversion in sysctl handler ... Browse Code »

proc_doulongvec_minmax may fail if the given buffer doesn't represent a
valid number. If we provide something invalid we will initialize the
resulting value (nr_overcommit_huge_pages in this case) to a random value
from the stack.

The issue was introduced by a3d0c6aa when the default handler has been
replaced by the helper function where we do not check the return value.

Reproducer:
echo "" > /proc/sys/vm/nr_overcommit_hugepages

[akpm@linux-foundation.org: correctly propagate proc_doulongvec_minmax return code]
Signed-off-by: Michal Hocko
Cc: CAI Qian
Cc: Nishanth Aravamudan
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-01-14 09:32:49 +0800
32d6feadf mm/hugetlb.c: fix error-path memory leak in nr_hugepages_store_common() ... Browse Code »

The NODEMASK_ALLOC macro may dynamically allocate memory for its second
argument ('nodes_allowed' in this context).

In nr_hugepages_store_common() we may abort early if strict_strtoul()
fails, but in that case we do not free the memory already allocated to
'nodes_allowed', causing a memory leak.

This patch closes the leak by freeing the memory in the error path.

[akpm@linux-foundation.org: use NODEMASK_FREE, per Minchan Kim]
Signed-off-by: Jesper Juhl
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2011-01-14 09:32:48 +0800
47ad8475c thp: clear_copy_huge_page ... Browse Code »

Move the copy/clear_huge_page functions to common code to share between
hugetlb.c and huge_memory.c.

Signed-off-by: Andrea Arcangeli
Acked-by: Rik van Riel
Acked-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-01-14 09:32:41 +0800

03 Dec, 2010

1 commit

1f64d69c7 mm/hugetlb.c: avoid double unlock_page() in hugetlb_fault() ... Browse Code »

Have hugetlb_fault() call unlock_page(page) only if it had previously
called lock_page(page).

Setting CONFIG_DEBUG_VM=y and then running the libhugetlbfs test suite,
resulted in the tripping of VM_BUG_ON(!PageLocked(page)) in
unlock_page() having been called by hugetlb_fault() when page ==
pagecache_page. This patch remedied the problem.

Signed-off-by: Dean Nelson
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dean Nelson
2010-12-03 06:51:14 +0800

27 Oct, 2010

1 commit

44e2aa937 mm/hugetlb.c: add missing spin_lock() to hugetlb_cow() ... Browse Code »

Add missing spin_lock() of the page_table_lock before an error return in
hugetlb_cow(). Callers of hugtelb_cow() expect it to be held upon return.

Signed-off-by: Dean Nelson
Cc: Mel Gorman
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dean Nelson
2010-10-27 07:52:11 +0800

08 Oct, 2010

9 commits

aa50d3a7a Encode huge page size for VM_FAULT_HWPOISON errors ... Browse Code »

This fixes a problem introduced with the hugetlb hwpoison handling

The user space SIGBUS signalling wants to know the size of the hugepage
that caused a HWPOISON fault.

Unfortunately the architecture page fault handlers do not have easy
access to the struct page.

Pass the information out in the fault error code instead.

I added a separate VM_FAULT_HWPOISON_LARGE bit for this case and encode
the hpage index in some free upper bits of the fault code. The small
page hwpoison keeps stays with the VM_FAULT_HWPOISON name to minimize
changes.

Also add code to hugetlb.h to convert that index into a page shift.

Will be used in a further patch.

Cc: Naoya Horiguchi
Cc: fengguang.wu@intel.com
Signed-off-by: Andi Kleen

Andi Kleen
2010-10-08 15:32:46 +0800
d5bd91069 hugepage: move is_hugepage_on_freelist inside ifdef to avoid warning ... Browse Code »

Fixes warning reported by Stephen Rothwell

mm/hugetlb.c:2950: warning: 'is_hugepage_on_freelist' defined but not used

for the !CONFIG_MEMORY_FAILURE case.

Signed-off-by: Andi Kleen

Andi Kleen
2010-10-08 15:32:46 +0800
8c6c2ecb4 HWPOSION, hugetlb: recover from free hugepage error when !MF_COUNT_INCREASED ... Browse Code »

Currently error recovery for free hugepage works only for MF_COUNT_INCREASED.
This patch enables !MF_COUNT_INCREASED case.

Free hugepages can be handled directly by alloc_huge_page() and
dequeue_hwpoisoned_huge_page(), and both of them are protected
by hugetlb_lock, so there is no race between them.

Note that this patch defines the refcount of HWPoisoned hugepage
dequeued from freelist is 1, deviated from present 0, thereby we
can avoid race between unpoison and memory failure on free hugepage.
This is reasonable because unlikely to free buddy pages, free hugepage
is governed by hugetlbfs even after error handling finishes.
And it also makes unpoison code added in the later patch cleaner.

Signed-off-by: Naoya Horiguchi
Signed-off-by: Jun'ichi Nomura
Acked-by: Mel Gorman
Signed-off-by: Andi Kleen

Naoya Horiguchi
2010-10-08 15:32:45 +0800
a9869b837 hugetlb: move refcounting in hugepage allocation inside hugetlb_lock ... Browse Code »

Currently alloc_huge_page() raises page refcount outside hugetlb_lock.
but it causes race when dequeue_hwpoison_huge_page() runs concurrently
with alloc_huge_page().
To avoid it, this patch moves set_page_refcounted() in hugetlb_lock.

Signed-off-by: Naoya Horiguchi
Signed-off-by: Wu Fengguang
Acked-by: Mel Gorman
Reviewed-by: Christoph Lameter
Signed-off-by: Andi Kleen

Naoya Horiguchi
2010-10-08 15:32:45 +0800
6de2b1aab HWPOISON, hugetlb: add free check to dequeue_hwpoison_huge_page() ... Browse Code »

This check is necessary to avoid race between dequeue and allocation,
which can cause a free hugepage to be dequeued twice and get kernel unstable.

Signed-off-by: Naoya Horiguchi
Signed-off-by: Wu Fengguang
Acked-by: Mel Gorman
Reviewed-by: Christoph Lameter
Signed-off-by: Andi Kleen

Naoya Horiguchi
2010-10-08 15:32:45 +0800
290408d4a hugetlb: hugepage migration core ... Browse Code »

This patch extends page migration code to support hugepage migration.
One of the potential users of this feature is soft offlining which
is triggered by memory corrected errors (added by the next patch.)

Todo:
- there are other users of page migration such as memory policy,
memory hotplug and memocy compaction.
They are not ready for hugepage support for now.

ChangeLog since v4:
- define migrate_huge_pages()
- remove changes on isolation/putback_lru_page()

ChangeLog since v2:
- refactor isolate/putback_lru_page() to handle hugepage
- add comment about race on unmap_and_move_huge_page()

ChangeLog since v1:
- divide migration code path for hugepage
- define routine checking migration swap entry for hugetlb
- replace "goto" with "if/else" in remove_migration_pte()

Signed-off-by: Naoya Horiguchi
Signed-off-by: Jun'ichi Nomura
Acked-by: Mel Gorman
Signed-off-by: Andi Kleen

Naoya Horiguchi
2010-10-08 15:32:45 +0800
0ebabb416 hugetlb: redefine hugepage copy functions ... Browse Code »

This patch modifies hugepage copy functions to have only destination
and source hugepages as arguments for later use.
The old ones are renamed from copy_{gigantic,huge}_page() to
copy_user_{gigantic,huge}_page().
This naming convention is consistent with that between copy_highpage()
and copy_user_highpage().

ChangeLog since v4:
- add blank line between local declaration and code
- remove unnecessary might_sleep()

ChangeLog since v2:
- change copy_huge_page() from macro to inline dummy function
to avoid compile warning when !CONFIG_HUGETLB_PAGE.

Signed-off-by: Naoya Horiguchi
Acked-by: Mel Gorman
Reviewed-by: Christoph Lameter
Signed-off-by: Andi Kleen

Naoya Horiguchi
2010-10-08 15:32:44 +0800
bf50bab2b hugetlb: add allocate function for hugepage migration ... Browse Code »

We can't use existing hugepage allocation functions to allocate hugepage
for page migration, because page migration can happen asynchronously with
the running processes and page migration users should call the allocation
function with physical addresses (not virtual addresses) as arguments.

ChangeLog since v3:
- unify alloc_buddy_huge_page() and alloc_buddy_huge_page_node()

ChangeLog since v2:
- remove unnecessary get/put_mems_allowed() (thanks to David Rientjes)

ChangeLog since v1:
- add comment on top of alloc_huge_page_no_vma()

Signed-off-by: Naoya Horiguchi
Acked-by: Mel Gorman
Signed-off-by: Jun'ichi Nomura
Reviewed-by: Christoph Lameter
Signed-off-by: Andi Kleen

Naoya Horiguchi
2010-10-08 15:32:44 +0800
998b4382c hugetlb: fix metadata corruption in hugetlb_fault() ... Browse Code »

Since the PageHWPoison() check is for avoiding hwpoisoned page remained
in pagecache mapping to the process, it should be done in "found in pagecache"
branch, not in the common path.
Otherwise, metadata corruption occurs if memory failure happens between
alloc_huge_page() and lock_page() because page fault fails with metadata
changes remained (such as refcount, mapcount, etc.)

This patch moves the check to "found in pagecache" branch and fix the problem.

ChangeLog since v2:
- remove retry check in "new allocation" path.
- make description more detailed
- change patch name from "HWPOISON, hugetlb: move PG_HWPoison bit check"

Signed-off-by: Naoya Horiguchi
Signed-off-by: Jun'ichi Nomura
Acked-by: Mel Gorman
Reviewed-by: Wu Fengguang
Reviewed-by: Christoph Lameter
Signed-off-by: Andi Kleen

Naoya Horiguchi
2010-10-08 15:32:44 +0800

24 Sep, 2010

2 commits

56c9cfb13 hugetlb, rmap: fix confusing page locking in hugetlb_cow() ... Browse Code »

The "if (!trylock_page)" block in the avoidcopy path of hugetlb_cow()
looks confusing and is buggy. Originally this trylock_page() was
intended to make sure that old_page is locked even when old_page !=
pagecache_page, because then only pagecache_page is locked.

This patch fixes it by moving page locking into hugetlb_fault().

Signed-off-by: Naoya Horiguchi
Acked-by: Rik van Riel
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2010-09-24 08:29:18 +0800
cd67f0d2a hugetlb, rmap: use hugepage_add_new_anon_rmap() in hugetlb_cow() ... Browse Code »

Obviously, setting anon_vma for COWed hugepage should be done
by hugepage_add_new_anon_rmap() to scan vmas faster.
This patch fixes it.

Signed-off-by: Naoya Horiguchi
Acked-by: Andrea Arcangeli
Reviewed-by: Rik van Riel
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2010-09-24 08:29:18 +0800

13 Aug, 2010

1 commit

1021a6453 Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 ... Browse Code »

* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6:
hugetlb: add missing unlock in avoidcopy path in hugetlb_cow()
hwpoison: rename CONFIG
HWPOISON, hugetlb: support hwpoison injection for hugepage
HWPOISON, hugetlb: detect hwpoison in hugetlb code
HWPOISON, hugetlb: isolate corrupted hugepage
HWPOISON, hugetlb: maintain mce_bad_pages in handling hugepage error
HWPOISON, hugetlb: set/clear PG_hwpoison bits on hugepage
HWPOISON, hugetlb: enable error handling path for hugepage
hugetlb, rmap: add reverse mapping for hugepage
hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h

Fix up trivial conflicts in mm/memory-failure.c

Linus Torvalds
2010-08-13 01:15:10 +0800

11 Aug, 2010

5 commits

28957a546 hugetlb: add missing unlock in avoidcopy path in hugetlb_cow() ... Browse Code »

This patch fixes possible deadlock in hugepage lock_page()
by adding missing unlock_page().

libhugetlbfs test will hit this bug when the next patch in this
patchset ("hugetlb, HWPOISON: move PG_HWPoison bit check") is applied.

Signed-off-by: Naoya Horiguchi
Signed-off-by: Jun'ichi Nomura
Acked-by: Fengguang Wu
Signed-off-by: Andi Kleen

Naoya Horiguchi
2010-08-11 15:23:48 +0800
43131e141 HWPOISON, hugetlb: support hwpoison injection for hugepage ... Browse Code »

This patch enables hwpoison injection through debug/hwpoison interfaces,
with which we can test memory error handling for free or reserved
hugepages (which cannot be tested by madvise() injector).

[AK: Export PageHuge too for the injection module]
Signed-off-by: Naoya Horiguchi
Cc: Andrew Morton
Acked-by: Fengguang Wu
Signed-off-by: Andi Kleen

Naoya Horiguchi
2010-08-11 15:23:11 +0800
fd6a03edd HWPOISON, hugetlb: detect hwpoison in hugetlb code ... Browse Code »

This patch enables to block access to hwpoisoned hugepage and
also enables to block unmapping for it.

Dependency:
"HWPOISON, hugetlb: enable error handling path for hugepage"

Signed-off-by: Naoya Horiguchi
Cc: Andrew Morton
Acked-by: Fengguang Wu
Acked-by: Mel Gorman
Signed-off-by: Andi Kleen

Naoya Horiguchi
2010-08-11 15:23:01 +0800
93f70f900 HWPOISON, hugetlb: isolate corrupted hugepage ... Browse Code »

If error hugepage is not in-use, we can fully recovery from error
by dequeuing it from freelist, so return RECOVERY.
Otherwise whether or not we can recovery depends on user processes,
so return DELAYED.

Dependency:
"HWPOISON, hugetlb: enable error handling path for hugepage"

Signed-off-by: Naoya Horiguchi
Cc: Andrew Morton
Acked-by: Fengguang Wu
Signed-off-by: Andi Kleen

Naoya Horiguchi
2010-08-11 15:22:46 +0800
0fe6e20b9 hugetlb, rmap: add reverse mapping for hugepage ... Browse Code »

This patch adds reverse mapping feature for hugepage by introducing
mapcount for shared/private-mapped hugepage and anon_vma for
private-mapped hugepage.

While hugepage is not currently swappable, reverse mapping can be useful
for memory error handler.

Without this patch, memory error handler cannot identify processes
using the bad hugepage nor unmap it from them. That is:
- for shared hugepage:
we can collect processes using a hugepage through pagecache,
but can not unmap the hugepage because of the lack of mapcount.
- for privately mapped hugepage:
we can neither collect processes nor unmap the hugepage.
This patch solves these problems.

This patch include the bug fix given by commit 23be7468e8, so reverts it.

Dependency:
"hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h"

ChangeLog since May 24.
- create hugetlb_inline.h and move is_vm_hugetlb_index() in it.
- move functions setting up anon_vma for hugepage into mm/rmap.c.

ChangeLog since May 13.
- rebased to 2.6.34
- fix logic error (in case that private mapping and shared mapping coexist)
- move is_vm_hugetlb_page() into include/linux/mm.h to use this function
from linear_page_index()
- define and use linear_hugepage_index() instead of compound_order()
- use page_move_anon_rmap() in hugetlb_cow()
- copy exclusive switch of __set_page_anon_rmap() into hugepage counterpart.
- revert commit 24be7468 completely

Signed-off-by: Naoya Horiguchi
Cc: Andi Kleen
Cc: Andrew Morton
Cc: Mel Gorman
Cc: Andrea Arcangeli
Cc: Larry Woodman
Cc: Lee Schermerhorn
Acked-by: Fengguang Wu
Acked-by: Mel Gorman
Signed-off-by: Andi Kleen

Naoya Horiguchi
2010-08-11 15:21:15 +0800

10 Aug, 2010

1 commit

3edd4fc95 hugetlb: call mmu notifiers on hugepage cow ... Browse Code »

When a copy-on-write occurs, we take one of two paths in handle_mm_fault:
through handle_pte_fault for normal pages, or through hugetlb_fault for
huge pages.

In the normal page case, we eventually get to do_wp_page and call mmu
notifiers via ptep_clear_flush_notify. There is no callout to the mmmu
notifiers in the huge page case. This patch fixes that.

Signed-off-by: Doug Doan
Acked-by: Mel Gorman
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Doug Doan
2010-08-10 11:44:54 +0800

25 May, 2010

1 commit

c0ff7453b cpuset,mm: fix no node to alloc memory when changing cpuset's mems ... Browse Code »
43

Before applying this patch, cpuset updates task->mems_allowed and
mempolicy by setting all new bits in the nodemask first, and clearing all
old unallowed bits later. But in the way, the allocator may find that
there is no node to alloc memory.

The reason is that cpuset rebinds the task's mempolicy, it cleans the
nodes which the allocater can alloc pages on, for example:

(mpol: mempolicy)
task1 task1's mpol task2
alloc page 1
alloc on node0? NO 1
1 change mems from 1 to 0
1 rebind task1's mpol
0-1 set new bits
0 clear disallowed bits
alloc on node1? NO 0
...
can't alloc page
goto oom

This patch fixes this problem by expanding the nodes range first(set newly
allowed bits) and shrink it lazily(clear newly disallowed bits). So we
use a variable to tell the write-side task that read-side task is reading
nodemask, and the write-side task clears newly disallowed nodes after
read-side task ends the current memory allocation.

[akpm@linux-foundation.org: fix spello]
Signed-off-by: Miao Xie
Cc: David Rientjes
Cc: Nick Piggin
Cc: Paul Menage
Cc: Lee Schermerhorn
Cc: Hugh Dickins
Cc: Ravikiran Thirumalai
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miao Xie
2010-05-25 23:06:57 +0800

12 May, 2010

1 commit

4a6018f7f hugetlbfs: kill applications that use MAP_NORESERVE with SIGBUS instead of OOM-killer ... Browse Code »

Ordinarily, application using hugetlbfs will create mappings with
reserves. For shared mappings, these pages are reserved before mmap()
returns success and for private mappings, the caller process is guaranteed
and a child process that cannot get the pages gets killed with sigbus.

An application that uses MAP_NORESERVE gets no reservations and mmap()
will always succeed at the risk the page will not be available at fault
time. This might be used for example on very large sparse mappings where
the developer is confident the necessary huge pages exist to satisfy all
faults even though the whole mapping cannot be backed by huge pages.
Unfortunately, if an allocation does fail, VM_FAULT_OOM is returned to the
fault handler which proceeds to trigger the OOM-killer. This is
unhelpful.

Even without hugetlbfs mounted, a user using mmap() can trivially trigger
the OOM-killer because VM_FAULT_OOM is returned (will provide example
program if desired - it's a whopping 24 lines long). It could be
considered a DOS available to an unprivileged user.

This patch alters hugetlbfs to kill a process that uses MAP_NORESERVE
where huge pages were not available with SIGBUS instead of triggering the
OOM killer.

This change affects hugetlb_cow() as well. I feel there is a failure case
in there, but I didn't create one. It would need a fairly specific target
in terms of the faulting application and the hugepage pool size. The
hugetlb_no_page() path is much easier to hit but both might as well be
closed.

Signed-off-by: Mel Gorman
Cc: Lee Schermerhorn
Cc: David Rientjes
Cc: Andi Kleen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2010-05-12 08:33:42 +0800

25 Apr, 2010

1 commit

23be7468e hugetlb: fix infinite loop in get_futex_key() when backed by huge pages ... Browse Code »

If a futex key happens to be located within a huge page mapped
MAP_PRIVATE, get_futex_key() can go into an infinite loop waiting for a
page->mapping that will never exist.

See https://bugzilla.redhat.com/show_bug.cgi?id=552257 for more details
about the problem.

This patch makes page->mapping a poisoned value that includes
PAGE_MAPPING_ANON mapped MAP_PRIVATE. This is enough for futex to
continue but because of PAGE_MAPPING_ANON, the poisoned value is not
dereferenced or used by futex. No other part of the VM should be
dereferencing the page->mapping of a hugetlbfs page as its page cache is
not on the LRU.

This patch fixes the problem with the test case described in the bugzilla.

[akpm@linux-foundation.org: mel cant spel]
Signed-off-by: Mel Gorman
Acked-by: Peter Zijlstra
Acked-by: Darren Hart
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2010-04-25 02:31:25 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

02 Mar, 2010

1 commit

ac0f6f927 Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm ... Browse Code »

* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (100 commits)
ARM: Eliminate decompressor -Dstatic= PIC hack
ARM: 5958/1: ARM: U300: fix inverted clk round rate
ARM: 5956/1: misplaced parentheses
ARM: 5955/1: ep93xx: move timer defines into core.c and document
ARM: 5954/1: ep93xx: move gpio interrupt support to gpio.c
ARM: 5953/1: ep93xx: fix broken build of clock.c
ARM: 5952/1: ARM: MM: Add ARM_L1_CACHE_SHIFT_6 for handle inside each ARCH Kconfig
ARM: 5949/1: NUC900 add gpio virtual memory map
ARM: 5948/1: Enable timer0 to time4 clock support for nuc910
ARM: 5940/2: ARM: MMCI: remove custom DBG macro and printk
ARM: make_coherent(): fix problems with highpte, part 2
MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself
ARM: 5945/1: ep93xx: include correct irq.h in core.c
ARM: 5933/1: amba-pl011: support hardware flow control
ARM: 5930/1: Add PKMAP area description to memory.txt.
ARM: 5929/1: Add checks to detect overlap of memory regions.
ARM: 5928/1: Change type of VMALLOC_END to unsigned long.
ARM: 5927/1: Make delimiters of DMA area globally visibly.
ARM: 5926/1: Add "Virtual kernel memory..." printout.
ARM: 5920/1: OMAP4: Enable L2 Cache
...

Fix up trivial conflict in arch/arm/mach-mx25/clock.c

Linus Torvalds
2010-03-02 01:15:15 +0800

21 Feb, 2010

1 commit

4b3073e1c MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself ... Browse Code »

On VIVT ARM, when we have multiple shared mappings of the same file
in the same MM, we need to ensure that we have coherency across all
copies. We do this via make_coherent() by making the pages
uncacheable.

This used to work fine, until we allowed highmem with highpte - we
now have a page table which is mapped as required, and is not available
for modification via update_mmu_cache().

Ralf Beache suggested getting rid of the PTE value passed to
update_mmu_cache():

On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables
to construct a pointer to the pte again. Passing a pte_t * is much
more elegant. Maybe we might even replace the pte argument with the
pte_t?

Ben Herrenschmidt would also like the pte pointer for PowerPC:

Passing the ptep in there is exactly what I want. I want that
-instead- of the PTE value, because I have issue on some ppc cases,
for I$/D$ coherency, where set_pte_at() may decide to mask out the
_PAGE_EXEC.

So, pass in the mapped page table pointer into update_mmu_cache(), and
remove the PTE value, updating all implementations and call sites to
suit.

Includes a fix from Stephen Rothwell:

sparc: fix fallout from update_mmu_cache API change

Signed-off-by: Stephen Rothwell

Acked-by: Benjamin Herrenschmidt
Signed-off-by: Russell King

Russell King
2010-02-21 00:41:46 +0800

03 Feb, 2010

1 commit

094e9539b hugetlb: fix section mismatches ... Browse Code »

hugetlb_sysfs_add_hstate is called by hugetlb_register_node directly
during init and also indirectly via sysfs after init.

This patch removes the __init tag from hugetlb_sysfs_add_hstate.

Signed-off-by: Jeff Mahoney
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jeff Mahoney
2010-02-03 10:11:22 +0800

12 Jan, 2010

1 commit

74dbdd239 mm: hugetlb: fix clear_huge_page() ... Browse Code »

sz is in bytes, MAX_ORDER_NR_PAGES is in pages.

Signed-off-by: Andrea Arcangeli
Acked-by: David Gibson
Cc: Mel Gorman
Cc: David Rientjes
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2010-01-12 01:34:06 +0800

16 Dec, 2009

6 commits

536240f2b hugetlb: abort a hugepage pool resize if a signal is pending ... Browse Code »

If a user asks for a hugepage pool resize but specified a large number,
the machine can begin trashing. In response, they might hit ctrl-c but
signals are ignored and the pool resize continues until it fails an
allocation. This can take a considerable amount of time so this patch
aborts a pool resize if a signal is pending.

Suggested by Dave Hansen.

Signed-off-by: Mel Gorman
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2009-12-16 00:53:24 +0800
4eb2b1dcd hugetlb: acquire the i_mmap_lock before walking the prio_tree to unmap a page ... Browse Code »

When the owner of a mapping fails COW because a child process is holding a
reference, the children VMAs are walked and the page is unmapped. The
i_mmap_lock is taken for the unmapping of the page but not the walking of
the prio_tree. In theory, that tree could be changing if the lock is not
held. This patch takes the i_mmap_lock properly for the duration of the
prio_tree walk.

[hugh.dickins@tiscali.co.uk: Spotted the problem in the first place]
Signed-off-by: Mel Gorman
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2009-12-16 00:53:23 +0800
b76c8cfbf hugetlb: prevent deadlock in __unmap_hugepage_range() when alloc_huge_page() fails ... Browse Code »

hugetlb_fault() takes the mm->page_table_lock spinlock then calls
hugetlb_cow(). If the alloc_huge_page() in hugetlb_cow() fails due to an
insufficient huge page pool it calls unmap_ref_private() with the
mm->page_table_lock held. unmap_ref_private() then calls
unmap_hugepage_range() which tries to acquire the mm->page_table_lock.

[] print_circular_bug_tail+0x80/0x9f
[] ? check_noncircular+0xb0/0xe8
[] __lock_acquire+0x956/0xc0e
[] lock_acquire+0xee/0x12e
[] ? unmap_hugepage_range+0x3e/0x84
[] ? unmap_hugepage_range+0x3e/0x84
[] _spin_lock+0x40/0x89
[] ? unmap_hugepage_range+0x3e/0x84
[] ? alloc_huge_page+0x218/0x318
[] unmap_hugepage_range+0x3e/0x84
[] hugetlb_cow+0x1e2/0x3f4
[] ? hugetlb_fault+0x453/0x4f6
[] hugetlb_fault+0x480/0x4f6
[] follow_hugetlb_page+0x116/0x2d9
[] ? _spin_unlock_irq+0x3a/0x5c
[] __get_user_pages+0x2a3/0x427
[] get_user_pages+0x3e/0x54
[] get_user_pages_fast+0x170/0x1b5
[] dio_get_page+0x64/0x14a
[] __blockdev_direct_IO+0x4b7/0xb31
[] blkdev_direct_IO+0x58/0x6e
[] ? blkdev_get_blocks+0x0/0xb8
[] generic_file_aio_read+0xdd/0x528
[] ? avc_has_perm+0x66/0x8c
[] do_sync_read+0xf5/0x146
[] ? autoremove_wake_function+0x0/0x5a
[] ? security_file_permission+0x24/0x3a
[] vfs_read+0xb5/0x126
[] ? fget_light+0x5e/0xf8
[] sys_read+0x54/0x8c
[] system_call_fastpath+0x16/0x1b

This can be fixed by dropping the mm->page_table_lock around the call to
unmap_ref_private() if alloc_huge_page() fails, its dropped right below in
the normal path anyway. However, earlier in the that function, it's also
possible to call into the page allocator with the same spinlock held.

What this patch does is drop the spinlock before the page allocator is
potentially entered. The check for page allocation failure can be made
without the page_table_lock as well as the copy of the huge page. Even if
the PTE changed while the spinlock was held, the consequence is that a
huge page is copied unnecessarily. This resolves both the double taking
of the lock and sleeping with the spinlock held.

[mel@csn.ul.ie: Cover also the case where process can sleep with spinlock]
Signed-off-by: Larry Woodman
Signed-off-by: Mel Gorman
Acked-by: Adam Litke
Cc: Andy Whitcroft
Cc: Lee Schermerhorn
Cc: Hugh Dickins
Cc: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Larry Woodman
2009-12-16 00:53:20 +0800
bad44b5be mm: add gfp flags for NODEMASK_ALLOC slab allocations ... Browse Code »

Objects passed to NODEMASK_ALLOC() are relatively small in size and are
backed by slab caches that are not of large order, traditionally never
greater than PAGE_ALLOC_COSTLY_ORDER.

Thus, using GFP_KERNEL for these allocations on large machines when
CONFIG_NODES_SHIFT > 8 will cause the page allocator to loop endlessly in
the allocation attempt, each time invoking both direct reclaim or the oom
killer.

This is of particular interest when using NODEMASK_ALLOC() from a
mempolicy context (either directly in mm/mempolicy.c or the mempolicy
constrained hugetlb allocations) since the oom killer always kills current
when allocations are constrained by mempolicies. So for all present use
cases in the kernel, current would end up being oom killed when direct
reclaim fails. That would allow the NODEMASK_ALLOC() to succeed but
current would have sacrificed itself upon returning.

This patch adds gfp flags to NODEMASK_ALLOC() to pass to kmalloc() on
CONFIG_NODES_SHIFT > 8; this parameter is a nop on other configurations.
All current use cases either directly from hugetlb code or indirectly via
NODEMASK_SCRATCH() union __GFP_NORETRY to avoid direct reclaim and the oom
killer when the slab allocator needs to allocate additional pages.

The side-effect of this change is that all current use cases of either
NODEMASK_ALLOC() or NODEMASK_SCRATCH() need appropriate -ENOMEM handling
when the allocation fails (never for CONFIG_NODES_SHIFT
Acked-by: KAMEZAWA Hiroyuki
Cc: Lee Schermerhorn
Cc: Mel Gorman
Cc: Randy Dunlap
Cc: Nishanth Aravamudan
Cc: Andi Kleen
Cc: David Rientjes
Cc: Adam Litke
Cc: Andy Whitcroft
Cc: Eric Whitney
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2009-12-16 00:53:13 +0800
9b5e5d0fd hugetlb: use only nodes with memory for huge pages ... Browse Code »

Register per node hstate sysfs attributes only for nodes with memory.
Global replacement of 'all online nodes" with "all nodes with memory" in
mm/hugetlb.c. Suggested by David Rientjes.

A subsequent patch will handle adding/removing of per node hstate sysfs
attributes when nodes transition to/from memoryless state via memory
hotplug.

NOTE: this patch has not been tested with memoryless nodes.

Signed-off-by: Lee Schermerhorn
Reviewed-by: Andi Kleen
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Cc: Randy Dunlap
Cc: Nishanth Aravamudan
Acked-by: David Rientjes
Cc: Adam Litke
Cc: Andy Whitcroft
Cc: Eric Whitney
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2009-12-16 00:53:13 +0800
9a3052306 hugetlb: add per node hstate attributes ... Browse Code »

Add the per huge page size control/query attributes to the per node
sysdevs:

/sys/devices/system/node/node/hugepages/hugepages-/
nr_hugepages - r/w
free_huge_pages - r/o
surplus_huge_pages - r/o

The patch attempts to re-use/share as much of the existing global hstate
attribute initialization and handling, and the "nodes_allowed" constraint
processing as possible.

Calling set_max_huge_pages() with no node indicates a change to global
hstate parameters. In this case, any non-default task mempolicy will be
used to generate the nodes_allowed mask. A valid node id indicates an
update to that node's hstate parameters, and the count argument specifies
the target count for the specified node. From this info, we compute the
target global count for the hstate and construct a nodes_allowed node mask
contain only the specified node.

Setting the node specific nr_hugepages via the per node attribute
effectively ignores any task mempolicy or cpuset constraints.

With this patch:

(me):ls /sys/devices/system/node/node0/hugepages/hugepages-2048kB
./ ../ free_hugepages nr_hugepages surplus_hugepages

Starting from:
Node 0 HugePages_Total: 0
Node 0 HugePages_Free: 0
Node 0 HugePages_Surp: 0
Node 1 HugePages_Total: 0
Node 1 HugePages_Free: 0
Node 1 HugePages_Surp: 0
Node 2 HugePages_Total: 0
Node 2 HugePages_Free: 0
Node 2 HugePages_Surp: 0
Node 3 HugePages_Total: 0
Node 3 HugePages_Free: 0
Node 3 HugePages_Surp: 0
vm.nr_hugepages = 0

Allocate 16 persistent huge pages on node 2:
(me):echo 16 >/sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages

[Note that this is equivalent to:
numactl -m 2 hugeadmin --pool-pages-min 2M:+16
]

Yields:
Node 0 HugePages_Total: 0
Node 0 HugePages_Free: 0
Node 0 HugePages_Surp: 0
Node 1 HugePages_Total: 0
Node 1 HugePages_Free: 0
Node 1 HugePages_Surp: 0
Node 2 HugePages_Total: 16
Node 2 HugePages_Free: 16
Node 2 HugePages_Surp: 0
Node 3 HugePages_Total: 0
Node 3 HugePages_Free: 0
Node 3 HugePages_Surp: 0
vm.nr_hugepages = 16

Global controls work as expected--reduce pool to 8 persistent huge pages:
(me):echo 8 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

Node 0 HugePages_Total: 0
Node 0 HugePages_Free: 0
Node 0 HugePages_Surp: 0
Node 1 HugePages_Total: 0
Node 1 HugePages_Free: 0
Node 1 HugePages_Surp: 0
Node 2 HugePages_Total: 8
Node 2 HugePages_Free: 8
Node 2 HugePages_Surp: 0
Node 3 HugePages_Total: 0
Node 3 HugePages_Free: 0
Node 3 HugePages_Surp: 0

Signed-off-by: Lee Schermerhorn
Acked-by: Mel Gorman
Reviewed-by: Andi Kleen
Cc: KAMEZAWA Hiroyuki
Cc: Randy Dunlap
Cc: Nishanth Aravamudan
Cc: David Rientjes
Cc: Adam Litke
Cc: Andy Whitcroft
Cc: Eric Whitney
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2009-12-16 00:53:12 +0800