11 Jan, 2012

5 commits

  • If we have to hand back the newly allocated huge page to page allocator,
    for any reason, the changed counter should be recovered.

    This affects only s390 at present.

    Signed-off-by: Hillf Danton
    Reviewed-by: Michal Hocko
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hillf Danton
     
  • The computation for pgoff is incorrect, at least with

    (vma->vm_pgoff >> PAGE_SHIFT)

    involved. It is fixed with the available method if HPAGE_SIZE is
    concerned in page cache lookup.

    [akpm@linux-foundation.org: use vma_hugecache_offset() directly, per Michal]
    Signed-off-by: Hillf Danton
    Cc: Mel Gorman
    Cc: Michal Hocko
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Andrea Arcangeli
    Cc: David Rientjes
    Reviewed-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hillf Danton
     
  • handle_mm_fault() passes 'faulted' address to hugetlb_fault(). This
    address is not aligned to a hugepage boundary.

    Most of the functions for hugetlb pages are aware of that and calculate an
    alignment themselves. However some functions such as
    copy_user_huge_page() and clear_huge_page() don't handle alignment by
    themselves.

    This patch make hugeltb_fault() fix the alignment and pass an aligned
    addresss (to address of a faulted hugepage) to functions.

    [akpm@linux-foundation.org: use &=]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Let's make it clear that we cannot race with other fault handlers due to
    hugetlb (global) mutex. Also make it clear that we want to keep pte_same
    checks anayway to have a transition from the global mutex easier.

    Signed-off-by: Michal Hocko
    Cc: Hillf Danton
    Cc: Andrea Arcangeli
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Currently we are not rechecking pte_same in hugetlb_cow after we take ptl
    lock again in the page allocation failure code path and simply retry
    again. This is not an issue at the moment because hugetlb fault path is
    protected by hugetlb_instantiation_mutex so we cannot race.

    The original page is locked and so we cannot race even with the page
    migration.

    Let's add the pte_same check anyway as we want to be consistent with the
    other check later in this function and be safe if we ever remove the
    mutex.

    [mhocko@suse.cz: reworded the changelog]
    Signed-off-by: Hillf Danton
    Signed-off-by: Michal Hocko
    Cc: Andrea Arcangeli
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hillf Danton
     

07 Jan, 2012

1 commit

  • This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
    and it fixes the build error in the arch/x86/kernel/microcode_core.c
    file, that the merge did not catch.

    The microcode_core.c patch was provided by Stephen Rothwell
    who was invaluable in the merge issues involved
    with the large sysdev removal process in the driver-core tree.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

30 Dec, 2011

1 commit


22 Dec, 2011

1 commit

  • This moves the 'memory sysdev_class' over to a regular 'memory' subsystem
    and converts the devices to regular devices. The sysdev drivers are
    implemented as subsystem interfaces now.

    After all sysdev classes are ported to regular driver core entities, the
    sysdev implementation will be entirely removed from the kernel.

    Signed-off-by: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

09 Dec, 2011

1 commit

  • Commit 70b50f94f1644 ("mm: thp: tail page refcounting fix") keeps all
    page_tail->_count zero at all times. But the current kernel does not
    set page_tail->_count to zero if a 1GB page is utilized. So when an
    IOMMU 1GB page is used by KVM, it wil result in a kernel oops because a
    tail page's _count does not equal zero.

    kernel BUG at include/linux/mm.h:386!
    invalid opcode: 0000 [#1] SMP
    Call Trace:
    gup_pud_range+0xb8/0x19d
    get_user_pages_fast+0xcb/0x192
    ? trace_hardirqs_off+0xd/0xf
    hva_to_pfn+0x119/0x2f2
    gfn_to_pfn_memslot+0x2c/0x2e
    kvm_iommu_map_pages+0xfd/0x1c1
    kvm_iommu_map_memslots+0x7c/0xbd
    kvm_iommu_map_guest+0xaa/0xbf
    kvm_vm_ioctl_assigned_device+0x2ef/0xa47
    kvm_vm_ioctl+0x36c/0x3a2
    do_vfs_ioctl+0x49e/0x4e4
    sys_ioctl+0x5a/0x7c
    system_call_fastpath+0x16/0x1b
    RIP gup_huge_pud+0xf2/0x159

    Signed-off-by: Youquan Song
    Reviewed-by: Andrea Arcangeli
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Youquan Song
     

16 Nov, 2011

1 commit


26 Jul, 2011

2 commits


16 Jun, 2011

1 commit

  • When 1GB hugepages are allocated on a system, free(1) reports less
    available memory than what really is installed in the box. Also, if the
    total size of hugepages allocated on a system is over half of the total
    memory size, CommitLimit becomes a negative number.

    The problem is that gigantic hugepages (order > MAX_ORDER) can only be
    allocated at boot with bootmem, thus its frames are not accounted to
    'totalram_pages'. However, they are accounted to hugetlb_total_pages()

    What happens to turn CommitLimit into a negative number is this
    calculation, in fs/proc/meminfo.c:

    allowed = ((totalram_pages - hugetlb_total_pages())
    * sysctl_overcommit_ratio / 100) + total_swap_pages;

    A similar calculation occurs in __vm_enough_memory() in mm/mmap.c.

    Also, every vm statistic which depends on 'totalram_pages' will render
    confusing values, as if system were 'missing' some part of its memory.

    Impact of this bug:

    When gigantic hugepages are allocated and sysctl_overcommit_memory ==
    OVERCOMMIT_NEVER. In a such situation, __vm_enough_memory() goes through
    the mentioned 'allowed' calculation and might end up mistakenly returning
    -ENOMEM, thus forcing the system to start reclaiming pages earlier than it
    would be ususal, and this could cause detrimental impact to overall
    system's performance, depending on the workload.

    Besides the aforementioned scenario, I can only think of this causing
    annoyances with memory reports from /proc/meminfo and free(1).

    [akpm@linux-foundation.org: standardize comment layout]
    Reported-by: Russ Anderson
    Signed-off-by: Rafael Aquini
    Acked-by: Russ Anderson
    Cc: Andrea Arcangeli
    Cc: Christoph Lameter
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     

06 Jun, 2011

1 commit

  • Al Viro observes that in the hugetlb case, handle_mm_fault() may return
    a value of the kind ENOSPC when its caller is expecting a value of the
    kind VM_FAULT_SIGBUS: fix alloc_huge_page()'s failure returns.

    Signed-off-by: Hugh Dickins
    Acked-by: Al Viro
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

27 May, 2011

1 commit

  • The type of vma->vm_flags is 'unsigned long'. Neither 'int' nor
    'unsigned int'. This patch fixes such misuse.

    Signed-off-by: KOSAKI Motohiro
    [ Changed to use a typedef - we'll extend it to cover more cases
    later, since there has been discussion about making it a 64-bit
    type.. - Linus ]
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

25 May, 2011

1 commit

  • Straightforward conversion of i_mmap_lock to a mutex.

    Signed-off-by: Peter Zijlstra
    Acked-by: Hugh Dickins
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: Martin Schwidefsky
    Cc: Russell King
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Tony Luck
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

26 Apr, 2011

1 commit


10 Apr, 2011

1 commit


31 Mar, 2011

1 commit


23 Mar, 2011

1 commit

  • When the user inserts a negative value into /proc/sys/vm/nr_hugepages it
    will cause the kernel to allocate as many hugepages as possible and to
    then update /proc/meminfo to reflect this.

    This changes the behavior so that the negative input will result in
    nr_hugepages value being unchanged.

    Signed-off-by: Petr Holasek
    Signed-off-by: Anton Arapov
    Reviewed-by: Naoya Horiguchi
    Acked-by: David Rientjes
    Acked-by: Mel Gorman
    Acked-by: Eric B Munson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Holasek
     

14 Jan, 2011

5 commits

  • When parsing changes to the huge page pool sizes made from userspace via
    the sysfs interface, bogus input values are being covered up by
    nr_hugepages_store_common and nr_overcommit_hugepages_store returning 0
    when strict_strtoul returns an error. This can cause an infinite loop in
    the nr_hugepages_store code. This patch changes the return value for
    these functions to -EINVAL when strict_strtoul returns an error.

    Signed-off-by: Eric B Munson
    Reported-by: CAI Qian
    Cc: Andrea Arcangeli
    Cc: Eric B Munson
    Cc: Michal Hocko
    Cc: Nishanth Aravamudan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric B Munson
     
  • Huge pages with order >= MAX_ORDER must be allocated at boot via the
    kernel command line, they cannot be allocated or freed once the kernel is
    up and running. Currently we allow values to be written to the sysfs and
    sysctl files controling pool size for these huge page sizes. This patch
    makes the store functions for nr_hugepages and nr_overcommit_hugepages
    return -EINVAL when the pool for a page size >= MAX_ORDER is changed.

    [akpm@linux-foundation.org: avoid multiple return paths in nr_hugepages_store_common()]
    [caiqian@redhat.com: add checking in hugetlb_overcommit_handler()]
    Signed-off-by: Eric B Munson
    Reported-by: CAI Qian
    Cc: Andrea Arcangeli
    Cc: Michal Hocko
    Cc: Nishanth Aravamudan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric B Munson
     
  • proc_doulongvec_minmax may fail if the given buffer doesn't represent a
    valid number. If we provide something invalid we will initialize the
    resulting value (nr_overcommit_huge_pages in this case) to a random value
    from the stack.

    The issue was introduced by a3d0c6aa when the default handler has been
    replaced by the helper function where we do not check the return value.

    Reproducer:
    echo "" > /proc/sys/vm/nr_overcommit_hugepages

    [akpm@linux-foundation.org: correctly propagate proc_doulongvec_minmax return code]
    Signed-off-by: Michal Hocko
    Cc: CAI Qian
    Cc: Nishanth Aravamudan
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • The NODEMASK_ALLOC macro may dynamically allocate memory for its second
    argument ('nodes_allowed' in this context).

    In nr_hugepages_store_common() we may abort early if strict_strtoul()
    fails, but in that case we do not free the memory already allocated to
    'nodes_allowed', causing a memory leak.

    This patch closes the leak by freeing the memory in the error path.

    [akpm@linux-foundation.org: use NODEMASK_FREE, per Minchan Kim]
    Signed-off-by: Jesper Juhl
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • Move the copy/clear_huge_page functions to common code to share between
    hugetlb.c and huge_memory.c.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

03 Dec, 2010

1 commit

  • Have hugetlb_fault() call unlock_page(page) only if it had previously
    called lock_page(page).

    Setting CONFIG_DEBUG_VM=y and then running the libhugetlbfs test suite,
    resulted in the tripping of VM_BUG_ON(!PageLocked(page)) in
    unlock_page() having been called by hugetlb_fault() when page ==
    pagecache_page. This patch remedied the problem.

    Signed-off-by: Dean Nelson
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dean Nelson
     

27 Oct, 2010

1 commit


08 Oct, 2010

9 commits

  • This fixes a problem introduced with the hugetlb hwpoison handling

    The user space SIGBUS signalling wants to know the size of the hugepage
    that caused a HWPOISON fault.

    Unfortunately the architecture page fault handlers do not have easy
    access to the struct page.

    Pass the information out in the fault error code instead.

    I added a separate VM_FAULT_HWPOISON_LARGE bit for this case and encode
    the hpage index in some free upper bits of the fault code. The small
    page hwpoison keeps stays with the VM_FAULT_HWPOISON name to minimize
    changes.

    Also add code to hugetlb.h to convert that index into a page shift.

    Will be used in a further patch.

    Cc: Naoya Horiguchi
    Cc: fengguang.wu@intel.com
    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Fixes warning reported by Stephen Rothwell

    mm/hugetlb.c:2950: warning: 'is_hugepage_on_freelist' defined but not used

    for the !CONFIG_MEMORY_FAILURE case.

    Signed-off-by: Andi Kleen

    Andi Kleen
     
  • Currently error recovery for free hugepage works only for MF_COUNT_INCREASED.
    This patch enables !MF_COUNT_INCREASED case.

    Free hugepages can be handled directly by alloc_huge_page() and
    dequeue_hwpoisoned_huge_page(), and both of them are protected
    by hugetlb_lock, so there is no race between them.

    Note that this patch defines the refcount of HWPoisoned hugepage
    dequeued from freelist is 1, deviated from present 0, thereby we
    can avoid race between unpoison and memory failure on free hugepage.
    This is reasonable because unlikely to free buddy pages, free hugepage
    is governed by hugetlbfs even after error handling finishes.
    And it also makes unpoison code added in the later patch cleaner.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Jun'ichi Nomura
    Acked-by: Mel Gorman
    Signed-off-by: Andi Kleen

    Naoya Horiguchi
     
  • Currently alloc_huge_page() raises page refcount outside hugetlb_lock.
    but it causes race when dequeue_hwpoison_huge_page() runs concurrently
    with alloc_huge_page().
    To avoid it, this patch moves set_page_refcounted() in hugetlb_lock.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Wu Fengguang
    Acked-by: Mel Gorman
    Reviewed-by: Christoph Lameter
    Signed-off-by: Andi Kleen

    Naoya Horiguchi
     
  • This check is necessary to avoid race between dequeue and allocation,
    which can cause a free hugepage to be dequeued twice and get kernel unstable.

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Wu Fengguang
    Acked-by: Mel Gorman
    Reviewed-by: Christoph Lameter
    Signed-off-by: Andi Kleen

    Naoya Horiguchi
     
  • This patch extends page migration code to support hugepage migration.
    One of the potential users of this feature is soft offlining which
    is triggered by memory corrected errors (added by the next patch.)

    Todo:
    - there are other users of page migration such as memory policy,
    memory hotplug and memocy compaction.
    They are not ready for hugepage support for now.

    ChangeLog since v4:
    - define migrate_huge_pages()
    - remove changes on isolation/putback_lru_page()

    ChangeLog since v2:
    - refactor isolate/putback_lru_page() to handle hugepage
    - add comment about race on unmap_and_move_huge_page()

    ChangeLog since v1:
    - divide migration code path for hugepage
    - define routine checking migration swap entry for hugetlb
    - replace "goto" with "if/else" in remove_migration_pte()

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Jun'ichi Nomura
    Acked-by: Mel Gorman
    Signed-off-by: Andi Kleen

    Naoya Horiguchi
     
  • This patch modifies hugepage copy functions to have only destination
    and source hugepages as arguments for later use.
    The old ones are renamed from copy_{gigantic,huge}_page() to
    copy_user_{gigantic,huge}_page().
    This naming convention is consistent with that between copy_highpage()
    and copy_user_highpage().

    ChangeLog since v4:
    - add blank line between local declaration and code
    - remove unnecessary might_sleep()

    ChangeLog since v2:
    - change copy_huge_page() from macro to inline dummy function
    to avoid compile warning when !CONFIG_HUGETLB_PAGE.

    Signed-off-by: Naoya Horiguchi
    Acked-by: Mel Gorman
    Reviewed-by: Christoph Lameter
    Signed-off-by: Andi Kleen

    Naoya Horiguchi
     
  • We can't use existing hugepage allocation functions to allocate hugepage
    for page migration, because page migration can happen asynchronously with
    the running processes and page migration users should call the allocation
    function with physical addresses (not virtual addresses) as arguments.

    ChangeLog since v3:
    - unify alloc_buddy_huge_page() and alloc_buddy_huge_page_node()

    ChangeLog since v2:
    - remove unnecessary get/put_mems_allowed() (thanks to David Rientjes)

    ChangeLog since v1:
    - add comment on top of alloc_huge_page_no_vma()

    Signed-off-by: Naoya Horiguchi
    Acked-by: Mel Gorman
    Signed-off-by: Jun'ichi Nomura
    Reviewed-by: Christoph Lameter
    Signed-off-by: Andi Kleen

    Naoya Horiguchi
     
  • Since the PageHWPoison() check is for avoiding hwpoisoned page remained
    in pagecache mapping to the process, it should be done in "found in pagecache"
    branch, not in the common path.
    Otherwise, metadata corruption occurs if memory failure happens between
    alloc_huge_page() and lock_page() because page fault fails with metadata
    changes remained (such as refcount, mapcount, etc.)

    This patch moves the check to "found in pagecache" branch and fix the problem.

    ChangeLog since v2:
    - remove retry check in "new allocation" path.
    - make description more detailed
    - change patch name from "HWPOISON, hugetlb: move PG_HWPoison bit check"

    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Jun'ichi Nomura
    Acked-by: Mel Gorman
    Reviewed-by: Wu Fengguang
    Reviewed-by: Christoph Lameter
    Signed-off-by: Andi Kleen

    Naoya Horiguchi
     

24 Sep, 2010

2 commits


13 Aug, 2010

1 commit

  • * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6:
    hugetlb: add missing unlock in avoidcopy path in hugetlb_cow()
    hwpoison: rename CONFIG
    HWPOISON, hugetlb: support hwpoison injection for hugepage
    HWPOISON, hugetlb: detect hwpoison in hugetlb code
    HWPOISON, hugetlb: isolate corrupted hugepage
    HWPOISON, hugetlb: maintain mce_bad_pages in handling hugepage error
    HWPOISON, hugetlb: set/clear PG_hwpoison bits on hugepage
    HWPOISON, hugetlb: enable error handling path for hugepage
    hugetlb, rmap: add reverse mapping for hugepage
    hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h

    Fix up trivial conflicts in mm/memory-failure.c

    Linus Torvalds
     

11 Aug, 2010

1 commit