11 Jan, 2012
5 commits
-
If we have to hand back the newly allocated huge page to page allocator,
for any reason, the changed counter should be recovered.This affects only s390 at present.
Signed-off-by: Hillf Danton
Reviewed-by: Michal Hocko
Acked-by: KAMEZAWA Hiroyuki
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The computation for pgoff is incorrect, at least with
(vma->vm_pgoff >> PAGE_SHIFT)
involved. It is fixed with the available method if HPAGE_SIZE is
concerned in page cache lookup.[akpm@linux-foundation.org: use vma_hugecache_offset() directly, per Michal]
Signed-off-by: Hillf Danton
Cc: Mel Gorman
Cc: Michal Hocko
Reviewed-by: KAMEZAWA Hiroyuki
Cc: Andrea Arcangeli
Cc: David Rientjes
Reviewed-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
handle_mm_fault() passes 'faulted' address to hugetlb_fault(). This
address is not aligned to a hugepage boundary.Most of the functions for hugetlb pages are aware of that and calculate an
alignment themselves. However some functions such as
copy_user_huge_page() and clear_huge_page() don't handle alignment by
themselves.This patch make hugeltb_fault() fix the alignment and pass an aligned
addresss (to address of a faulted hugepage) to functions.[akpm@linux-foundation.org: use &=]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Let's make it clear that we cannot race with other fault handlers due to
hugetlb (global) mutex. Also make it clear that we want to keep pte_same
checks anayway to have a transition from the global mutex easier.Signed-off-by: Michal Hocko
Cc: Hillf Danton
Cc: Andrea Arcangeli
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently we are not rechecking pte_same in hugetlb_cow after we take ptl
lock again in the page allocation failure code path and simply retry
again. This is not an issue at the moment because hugetlb fault path is
protected by hugetlb_instantiation_mutex so we cannot race.The original page is locked and so we cannot race even with the page
migration.Let's add the pte_same check anyway as we want to be consistent with the
other check later in this function and be safe if we ever remove the
mutex.[mhocko@suse.cz: reworded the changelog]
Signed-off-by: Hillf Danton
Signed-off-by: Michal Hocko
Cc: Andrea Arcangeli
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Jan, 2012
1 commit
-
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
and it fixes the build error in the arch/x86/kernel/microcode_core.c
file, that the merge did not catch.The microcode_core.c patch was provided by Stephen Rothwell
who was invaluable in the merge issues involved
with the large sysdev removal process in the driver-core tree.Signed-off-by: Greg Kroah-Hartman
30 Dec, 2011
1 commit
-
If a huge page is enqueued under the protection of hugetlb_lock, then the
operation is atomic and safe.Signed-off-by: Hillf Danton
Reviewed-by: Michal Hocko
Acked-by: KAMEZAWA Hiroyuki
Cc: [2.6.37+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Dec, 2011
1 commit
-
This moves the 'memory sysdev_class' over to a regular 'memory' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.Signed-off-by: Kay Sievers
Signed-off-by: Greg Kroah-Hartman
09 Dec, 2011
1 commit
-
Commit 70b50f94f1644 ("mm: thp: tail page refcounting fix") keeps all
page_tail->_count zero at all times. But the current kernel does not
set page_tail->_count to zero if a 1GB page is utilized. So when an
IOMMU 1GB page is used by KVM, it wil result in a kernel oops because a
tail page's _count does not equal zero.kernel BUG at include/linux/mm.h:386!
invalid opcode: 0000 [#1] SMP
Call Trace:
gup_pud_range+0xb8/0x19d
get_user_pages_fast+0xcb/0x192
? trace_hardirqs_off+0xd/0xf
hva_to_pfn+0x119/0x2f2
gfn_to_pfn_memslot+0x2c/0x2e
kvm_iommu_map_pages+0xfd/0x1c1
kvm_iommu_map_memslots+0x7c/0xbd
kvm_iommu_map_guest+0xaa/0xbf
kvm_vm_ioctl_assigned_device+0x2ef/0xa47
kvm_vm_ioctl+0x36c/0x3a2
do_vfs_ioctl+0x49e/0x4e4
sys_ioctl+0x5a/0x7c
system_call_fastpath+0x16/0x1b
RIP gup_huge_pud+0xf2/0x159Signed-off-by: Youquan Song
Reviewed-by: Andrea Arcangeli
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Nov, 2011
1 commit
-
If we fail to prepare an anon_vma, the {new, old}_page should be released,
or they will leak.Signed-off-by: Hillf Danton
Reviewed-by: Andrea Arcangeli
Cc: Hugh Dickins
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Jul, 2011
2 commits
-
Fix coding style issues flagged by checkpatch.pl
Signed-off-by: Chris Forbes
Acked-by: Eric B Munson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This is needed on HIGHMEM systems - we don't always have a virtual
address so store the physical address and map it in as needed.[akpm@linux-foundation.org: cleanup]
Signed-off-by: Becky Bruce
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Jun, 2011
1 commit
-
When 1GB hugepages are allocated on a system, free(1) reports less
available memory than what really is installed in the box. Also, if the
total size of hugepages allocated on a system is over half of the total
memory size, CommitLimit becomes a negative number.The problem is that gigantic hugepages (order > MAX_ORDER) can only be
allocated at boot with bootmem, thus its frames are not accounted to
'totalram_pages'. However, they are accounted to hugetlb_total_pages()What happens to turn CommitLimit into a negative number is this
calculation, in fs/proc/meminfo.c:allowed = ((totalram_pages - hugetlb_total_pages())
* sysctl_overcommit_ratio / 100) + total_swap_pages;A similar calculation occurs in __vm_enough_memory() in mm/mmap.c.
Also, every vm statistic which depends on 'totalram_pages' will render
confusing values, as if system were 'missing' some part of its memory.Impact of this bug:
When gigantic hugepages are allocated and sysctl_overcommit_memory ==
OVERCOMMIT_NEVER. In a such situation, __vm_enough_memory() goes through
the mentioned 'allowed' calculation and might end up mistakenly returning
-ENOMEM, thus forcing the system to start reclaiming pages earlier than it
would be ususal, and this could cause detrimental impact to overall
system's performance, depending on the workload.Besides the aforementioned scenario, I can only think of this causing
annoyances with memory reports from /proc/meminfo and free(1).[akpm@linux-foundation.org: standardize comment layout]
Reported-by: Russ Anderson
Signed-off-by: Rafael Aquini
Acked-by: Russ Anderson
Cc: Andrea Arcangeli
Cc: Christoph Lameter
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
06 Jun, 2011
1 commit
-
Al Viro observes that in the hugetlb case, handle_mm_fault() may return
a value of the kind ENOSPC when its caller is expecting a value of the
kind VM_FAULT_SIGBUS: fix alloc_huge_page()'s failure returns.Signed-off-by: Hugh Dickins
Acked-by: Al Viro
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds
27 May, 2011
1 commit
-
The type of vma->vm_flags is 'unsigned long'. Neither 'int' nor
'unsigned int'. This patch fixes such misuse.Signed-off-by: KOSAKI Motohiro
[ Changed to use a typedef - we'll extend it to cover more cases
later, since there has been discussion about making it a 64-bit
type.. - Linus ]
Signed-off-by: Linus Torvalds
25 May, 2011
1 commit
-
Straightforward conversion of i_mmap_lock to a mutex.
Signed-off-by: Peter Zijlstra
Acked-by: Hugh Dickins
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: Martin Schwidefsky
Cc: Russell King
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Richard Weinberger
Cc: Tony Luck
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Cc: KOSAKI Motohiro
Cc: Nick Piggin
Cc: Namhyung Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Apr, 2011
1 commit
-
Fast-forwarded to current state of Linus' tree as there are patches to be
applied for files that didn't exist on the old branch.
10 Apr, 2011
1 commit
-
Signed-off-by: Justin P. Mattock
Signed-off-by: Jiri Kosina
31 Mar, 2011
1 commit
-
Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: Lucas De Marchi
23 Mar, 2011
1 commit
-
When the user inserts a negative value into /proc/sys/vm/nr_hugepages it
will cause the kernel to allocate as many hugepages as possible and to
then update /proc/meminfo to reflect this.This changes the behavior so that the negative input will result in
nr_hugepages value being unchanged.Signed-off-by: Petr Holasek
Signed-off-by: Anton Arapov
Reviewed-by: Naoya Horiguchi
Acked-by: David Rientjes
Acked-by: Mel Gorman
Acked-by: Eric B Munson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Jan, 2011
5 commits
-
When parsing changes to the huge page pool sizes made from userspace via
the sysfs interface, bogus input values are being covered up by
nr_hugepages_store_common and nr_overcommit_hugepages_store returning 0
when strict_strtoul returns an error. This can cause an infinite loop in
the nr_hugepages_store code. This patch changes the return value for
these functions to -EINVAL when strict_strtoul returns an error.Signed-off-by: Eric B Munson
Reported-by: CAI Qian
Cc: Andrea Arcangeli
Cc: Eric B Munson
Cc: Michal Hocko
Cc: Nishanth Aravamudan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Huge pages with order >= MAX_ORDER must be allocated at boot via the
kernel command line, they cannot be allocated or freed once the kernel is
up and running. Currently we allow values to be written to the sysfs and
sysctl files controling pool size for these huge page sizes. This patch
makes the store functions for nr_hugepages and nr_overcommit_hugepages
return -EINVAL when the pool for a page size >= MAX_ORDER is changed.[akpm@linux-foundation.org: avoid multiple return paths in nr_hugepages_store_common()]
[caiqian@redhat.com: add checking in hugetlb_overcommit_handler()]
Signed-off-by: Eric B Munson
Reported-by: CAI Qian
Cc: Andrea Arcangeli
Cc: Michal Hocko
Cc: Nishanth Aravamudan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
proc_doulongvec_minmax may fail if the given buffer doesn't represent a
valid number. If we provide something invalid we will initialize the
resulting value (nr_overcommit_huge_pages in this case) to a random value
from the stack.The issue was introduced by a3d0c6aa when the default handler has been
replaced by the helper function where we do not check the return value.Reproducer:
echo "" > /proc/sys/vm/nr_overcommit_hugepages[akpm@linux-foundation.org: correctly propagate proc_doulongvec_minmax return code]
Signed-off-by: Michal Hocko
Cc: CAI Qian
Cc: Nishanth Aravamudan
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The NODEMASK_ALLOC macro may dynamically allocate memory for its second
argument ('nodes_allowed' in this context).In nr_hugepages_store_common() we may abort early if strict_strtoul()
fails, but in that case we do not free the memory already allocated to
'nodes_allowed', causing a memory leak.This patch closes the leak by freeing the memory in the error path.
[akpm@linux-foundation.org: use NODEMASK_FREE, per Minchan Kim]
Signed-off-by: Jesper Juhl
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move the copy/clear_huge_page functions to common code to share between
hugetlb.c and huge_memory.c.Signed-off-by: Andrea Arcangeli
Acked-by: Rik van Riel
Acked-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Dec, 2010
1 commit
-
Have hugetlb_fault() call unlock_page(page) only if it had previously
called lock_page(page).Setting CONFIG_DEBUG_VM=y and then running the libhugetlbfs test suite,
resulted in the tripping of VM_BUG_ON(!PageLocked(page)) in
unlock_page() having been called by hugetlb_fault() when page ==
pagecache_page. This patch remedied the problem.Signed-off-by: Dean Nelson
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Oct, 2010
1 commit
-
Add missing spin_lock() of the page_table_lock before an error return in
hugetlb_cow(). Callers of hugtelb_cow() expect it to be held upon return.Signed-off-by: Dean Nelson
Cc: Mel Gorman
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Oct, 2010
9 commits
-
This fixes a problem introduced with the hugetlb hwpoison handling
The user space SIGBUS signalling wants to know the size of the hugepage
that caused a HWPOISON fault.Unfortunately the architecture page fault handlers do not have easy
access to the struct page.Pass the information out in the fault error code instead.
I added a separate VM_FAULT_HWPOISON_LARGE bit for this case and encode
the hpage index in some free upper bits of the fault code. The small
page hwpoison keeps stays with the VM_FAULT_HWPOISON name to minimize
changes.Also add code to hugetlb.h to convert that index into a page shift.
Will be used in a further patch.
Cc: Naoya Horiguchi
Cc: fengguang.wu@intel.com
Signed-off-by: Andi Kleen -
Fixes warning reported by Stephen Rothwell
mm/hugetlb.c:2950: warning: 'is_hugepage_on_freelist' defined but not used
for the !CONFIG_MEMORY_FAILURE case.
Signed-off-by: Andi Kleen
-
Currently error recovery for free hugepage works only for MF_COUNT_INCREASED.
This patch enables !MF_COUNT_INCREASED case.Free hugepages can be handled directly by alloc_huge_page() and
dequeue_hwpoisoned_huge_page(), and both of them are protected
by hugetlb_lock, so there is no race between them.Note that this patch defines the refcount of HWPoisoned hugepage
dequeued from freelist is 1, deviated from present 0, thereby we
can avoid race between unpoison and memory failure on free hugepage.
This is reasonable because unlikely to free buddy pages, free hugepage
is governed by hugetlbfs even after error handling finishes.
And it also makes unpoison code added in the later patch cleaner.Signed-off-by: Naoya Horiguchi
Signed-off-by: Jun'ichi Nomura
Acked-by: Mel Gorman
Signed-off-by: Andi Kleen -
Currently alloc_huge_page() raises page refcount outside hugetlb_lock.
but it causes race when dequeue_hwpoison_huge_page() runs concurrently
with alloc_huge_page().
To avoid it, this patch moves set_page_refcounted() in hugetlb_lock.Signed-off-by: Naoya Horiguchi
Signed-off-by: Wu Fengguang
Acked-by: Mel Gorman
Reviewed-by: Christoph Lameter
Signed-off-by: Andi Kleen -
This check is necessary to avoid race between dequeue and allocation,
which can cause a free hugepage to be dequeued twice and get kernel unstable.Signed-off-by: Naoya Horiguchi
Signed-off-by: Wu Fengguang
Acked-by: Mel Gorman
Reviewed-by: Christoph Lameter
Signed-off-by: Andi Kleen -
This patch extends page migration code to support hugepage migration.
One of the potential users of this feature is soft offlining which
is triggered by memory corrected errors (added by the next patch.)Todo:
- there are other users of page migration such as memory policy,
memory hotplug and memocy compaction.
They are not ready for hugepage support for now.ChangeLog since v4:
- define migrate_huge_pages()
- remove changes on isolation/putback_lru_page()ChangeLog since v2:
- refactor isolate/putback_lru_page() to handle hugepage
- add comment about race on unmap_and_move_huge_page()ChangeLog since v1:
- divide migration code path for hugepage
- define routine checking migration swap entry for hugetlb
- replace "goto" with "if/else" in remove_migration_pte()Signed-off-by: Naoya Horiguchi
Signed-off-by: Jun'ichi Nomura
Acked-by: Mel Gorman
Signed-off-by: Andi Kleen -
This patch modifies hugepage copy functions to have only destination
and source hugepages as arguments for later use.
The old ones are renamed from copy_{gigantic,huge}_page() to
copy_user_{gigantic,huge}_page().
This naming convention is consistent with that between copy_highpage()
and copy_user_highpage().ChangeLog since v4:
- add blank line between local declaration and code
- remove unnecessary might_sleep()ChangeLog since v2:
- change copy_huge_page() from macro to inline dummy function
to avoid compile warning when !CONFIG_HUGETLB_PAGE.Signed-off-by: Naoya Horiguchi
Acked-by: Mel Gorman
Reviewed-by: Christoph Lameter
Signed-off-by: Andi Kleen -
We can't use existing hugepage allocation functions to allocate hugepage
for page migration, because page migration can happen asynchronously with
the running processes and page migration users should call the allocation
function with physical addresses (not virtual addresses) as arguments.ChangeLog since v3:
- unify alloc_buddy_huge_page() and alloc_buddy_huge_page_node()ChangeLog since v2:
- remove unnecessary get/put_mems_allowed() (thanks to David Rientjes)ChangeLog since v1:
- add comment on top of alloc_huge_page_no_vma()Signed-off-by: Naoya Horiguchi
Acked-by: Mel Gorman
Signed-off-by: Jun'ichi Nomura
Reviewed-by: Christoph Lameter
Signed-off-by: Andi Kleen -
Since the PageHWPoison() check is for avoiding hwpoisoned page remained
in pagecache mapping to the process, it should be done in "found in pagecache"
branch, not in the common path.
Otherwise, metadata corruption occurs if memory failure happens between
alloc_huge_page() and lock_page() because page fault fails with metadata
changes remained (such as refcount, mapcount, etc.)This patch moves the check to "found in pagecache" branch and fix the problem.
ChangeLog since v2:
- remove retry check in "new allocation" path.
- make description more detailed
- change patch name from "HWPOISON, hugetlb: move PG_HWPoison bit check"Signed-off-by: Naoya Horiguchi
Signed-off-by: Jun'ichi Nomura
Acked-by: Mel Gorman
Reviewed-by: Wu Fengguang
Reviewed-by: Christoph Lameter
Signed-off-by: Andi Kleen
24 Sep, 2010
2 commits
-
The "if (!trylock_page)" block in the avoidcopy path of hugetlb_cow()
looks confusing and is buggy. Originally this trylock_page() was
intended to make sure that old_page is locked even when old_page !=
pagecache_page, because then only pagecache_page is locked.This patch fixes it by moving page locking into hugetlb_fault().
Signed-off-by: Naoya Horiguchi
Acked-by: Rik van Riel
Signed-off-by: Linus Torvalds -
Obviously, setting anon_vma for COWed hugepage should be done
by hugepage_add_new_anon_rmap() to scan vmas faster.
This patch fixes it.Signed-off-by: Naoya Horiguchi
Acked-by: Andrea Arcangeli
Reviewed-by: Rik van Riel
Signed-off-by: Linus Torvalds
13 Aug, 2010
1 commit
-
* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6:
hugetlb: add missing unlock in avoidcopy path in hugetlb_cow()
hwpoison: rename CONFIG
HWPOISON, hugetlb: support hwpoison injection for hugepage
HWPOISON, hugetlb: detect hwpoison in hugetlb code
HWPOISON, hugetlb: isolate corrupted hugepage
HWPOISON, hugetlb: maintain mce_bad_pages in handling hugepage error
HWPOISON, hugetlb: set/clear PG_hwpoison bits on hugepage
HWPOISON, hugetlb: enable error handling path for hugepage
hugetlb, rmap: add reverse mapping for hugepage
hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.hFix up trivial conflicts in mm/memory-failure.c
11 Aug, 2010
1 commit
-
This patch fixes possible deadlock in hugepage lock_page()
by adding missing unlock_page().libhugetlbfs test will hit this bug when the next patch in this
patchset ("hugetlb, HWPOISON: move PG_HWPoison bit check") is applied.Signed-off-by: Naoya Horiguchi
Signed-off-by: Jun'ichi Nomura
Acked-by: Fengguang Wu
Signed-off-by: Andi Kleen