30 May, 2012

3 commits

  • hugetlb_reserve_pages() can be used for either normal file-backed
    hugetlbfs mappings, or MAP_HUGETLB. In the MAP_HUGETLB, semi-anonymous
    mode, there is not a VMA around. The new call to resv_map_put() assumed
    that there was, and resulted in a NULL pointer dereference:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
    IP: vma_resv_map+0x9/0x30
    PGD 141453067 PUD 1421e1067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP
    ...
    Pid: 14006, comm: trinity-child6 Not tainted 3.4.0+ #36
    RIP: vma_resv_map+0x9/0x30
    ...
    Process trinity-child6 (pid: 14006, threadinfo ffff8801414e0000, task ffff8801414f26b0)
    Call Trace:
    resv_map_put+0xe/0x40
    hugetlb_reserve_pages+0xa6/0x1d0
    hugetlb_file_setup+0x102/0x2c0
    newseg+0x115/0x360
    ipcget+0x1ce/0x310
    sys_shmget+0x5a/0x60
    system_call_fastpath+0x16/0x1b

    This was reported by Dave Jones, but was reproducible with the
    libhugetlbfs test cases, so shame on me for not running them in the
    first place.

    With this, the oops is gone, and the output of libhugetlbfs's
    run_tests.py is identical to plain 3.4 again.

    [ Marked for stable, since this was introduced by commit c50ac050811d
    ("hugetlb: fix resv_map leak in error path") which was also marked for
    stable ]

    Reported-by: Dave Jones
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: Christoph Lameter
    Cc: Andrea Arcangeli
    Cc: Andrew Morton
    Cc: [2.6.32+]
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • When called for anonymous (non-shared) mappings, hugetlb_reserve_pages()
    does a resv_map_alloc(). It depends on code in hugetlbfs's
    vm_ops->close() to release that allocation.

    However, in the mmap() failure path, we do a plain unmap_region() without
    the remove_vma() which actually calls vm_ops->close().

    This is a decent fix. This leak could get reintroduced if new code (say,
    after hugetlb_reserve_pages() in hugetlbfs_file_mmap()) decides to return
    an error. But, I think it would have to unroll the reservation anyway.

    Christoph's test case:

    http://marc.info/?l=linux-mm&m=133728900729735

    This patch applies to 3.4 and later. A version for earlier kernels is at
    https://lkml.org/lkml/2012/5/22/418.

    Signed-off-by: Dave Hansen
    Acked-by: Mel Gorman
    Acked-by: KOSAKI Motohiro
    Reported-by: Christoph Lameter
    Tested-by: Christoph Lameter
    Cc: Andrea Arcangeli
    Cc: [2.6.32+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • The arguments f & t and fields from & to of struct file_region are
    defined as long. So use long instead of int to type the temp vars.

    Signed-off-by: Wang Sheng-Hui
    Acked-by: David Rientjes
    Acked-by: Hillf Danton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wang Sheng-Hui
     

26 May, 2012

1 commit

  • The tile support for multiple-size huge pages requires tagging
    the hugetlb PTE with a "super" bit for PTEs that are multiples of
    the basic size of a pagetable span. To set that bit properly
    we need to tweak the PTe in make_huge_pte() based on the vma.

    This change provides the API for a subsequent tile-specific
    change to use.

    Reviewed-by: Hillf Danton
    Signed-off-by: Chris Metcalf

    Chris Metcalf
     

11 May, 2012

1 commit

  • Commit 66aebce747eaf ("hugetlb: fix race condition in hugetlb_fault()")
    added code to avoid a race condition by elevating the page refcount in
    hugetlb_fault() while calling hugetlb_cow().

    However, one code path in hugetlb_cow() includes an assertion that the
    page count is 1, whereas it may now also have the value 2 in this path.

    The consensus is that this BUG_ON has served its purpose, so rather than
    extending it to cover both cases, we just remove it.

    Signed-off-by: Chris Metcalf
    Acked-by: Mel Gorman
    Acked-by: Hillf Danton
    Acked-by: Hugh Dickins
    Cc: Michal Hocko
    Cc: KAMEZAWA Hiroyuki
    Cc: [3.0.29+, 3.2.16+, 3.3.3+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Metcalf
     

26 Apr, 2012

1 commit

  • Fix a gcc warning (and bug?) introduced in cc9a6c877 ("cpuset: mm: reduce
    large amounts of memory barrier related damage v3")

    Local variable "page" can be uninitialized if the nodemask from vma policy
    does not intersects with nodemask from cpuset. Even if it doesn't happens
    it is better to initialize this variable explicitly than to introduce
    a kernel oops in a weird corner case.

    mm/hugetlb.c: In function `alloc_huge_page':
    mm/hugetlb.c:1135:5: warning: `page' may be used uninitialized in this function

    Signed-off-by: Konstantin Khlebnikov
    Acked-by: Mel Gorman
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     

13 Apr, 2012

1 commit

  • The race is as follows:

    Suppose a multi-threaded task forks a new process (on cpu A), thus
    bumping up the ref count on all the pages. While the fork is occurring
    (and thus we have marked all the PTEs as read-only), another thread in
    the original process (on cpu B) tries to write to a huge page, taking an
    access violation from the write-protect and calling hugetlb_cow(). Now,
    suppose the fork() fails. It will undo the COW and decrement the ref
    count on the pages, so the ref count on the huge page drops back to 1.
    Meanwhile hugetlb_cow() also decrements the ref count by one on the
    original page, since the original address space doesn't need it any
    more, having copied a new page to replace the original page. This
    leaves the ref count at zero, and when we call unlock_page(), we panic.

    fork on CPU A fault on CPU B
    ============= ==============
    ...
    down_write(&parent->mmap_sem);
    down_write_nested(&child->mmap_sem);
    ...
    while duplicating vmas
    if error
    break;
    ...
    up_write(&child->mmap_sem);
    up_write(&parent->mmap_sem); ...
    down_read(&parent->mmap_sem);
    ...
    lock_page(page);
    handle COW
    page_mapcount(old_page) == 2
    alloc and prepare new_page
    ...
    handle error
    page_remove_rmap(page);
    put_page(page);
    ...
    fold new_page into pte
    page_remove_rmap(page);
    put_page(page);
    ...
    oops ==> unlock_page(page);
    up_read(&parent->mmap_sem);

    The solution is to take an extra reference to the page while we are
    holding the lock on it.

    Signed-off-by: Chris Metcalf
    Cc: Hillf Danton
    Cc: Michal Hocko
    Cc: KAMEZAWA Hiroyuki
    Cc: Hugh Dickins
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Metcalf
     

24 Mar, 2012

1 commit


22 Mar, 2012

4 commits

  • hugetlbfs_{get,put}_quota() are badly named. They don't interact with the
    general quota handling code, and they don't much resemble its behaviour.
    Rather than being about maintaining limits on on-disk block usage by
    particular users, they are instead about maintaining limits on in-memory
    page usage (including anonymous MAP_PRIVATE copied-on-write pages)
    associated with a particular hugetlbfs filesystem instance.

    Worse, they work by having callbacks to the hugetlbfs filesystem code from
    the low-level page handling code, in particular from free_huge_page().
    This is a layering violation of itself, but more importantly, if the
    kernel does a get_user_pages() on hugepages (which can happen from KVM
    amongst others), then the free_huge_page() can be delayed until after the
    associated inode has already been freed. If an unmount occurs at the
    wrong time, even the hugetlbfs superblock where the "quota" limits are
    stored may have been freed.

    Andrew Barry proposed a patch to fix this by having hugepages, instead of
    storing a pointer to their address_space and reaching the superblock from
    there, had the hugepages store pointers directly to the superblock,
    bumping the reference count as appropriate to avoid it being freed.
    Andrew Morton rejected that version, however, on the grounds that it made
    the existing layering violation worse.

    This is a reworked version of Andrew's patch, which removes the extra, and
    some of the existing, layering violation. It works by introducing the
    concept of a hugepage "subpool" at the lower hugepage mm layer - that is a
    finite logical pool of hugepages to allocate from. hugetlbfs now creates
    a subpool for each filesystem instance with a page limit set, and a
    pointer to the subpool gets added to each allocated hugepage, instead of
    the address_space pointer used now. The subpool has its own lifetime and
    is only freed once all pages in it _and_ all other references to it (i.e.
    superblocks) are gone.

    subpools are optional - a NULL subpool pointer is taken by the code to
    mean that no subpool limits are in effect.

    Previous discussion of this bug found in: "Fix refcounting in hugetlbfs
    quota handling.". See: https://lkml.org/lkml/2011/8/11/28 or
    http://marc.info/?l=linux-mm&m=126928970510627&w=1

    v2: Fixed a bug spotted by Hillf Danton, and removed the extra parameter to
    alloc_huge_page() - since it already takes the vma, it is not necessary.

    Signed-off-by: Andrew Barry
    Signed-off-by: David Gibson
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Hillf Danton
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Gibson
     
  • Commit c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when
    changing cpuset's mems") wins a super prize for the largest number of
    memory barriers entered into fast paths for one commit.

    [get|put]_mems_allowed is incredibly heavy with pairs of full memory
    barriers inserted into a number of hot paths. This was detected while
    investigating at large page allocator slowdown introduced some time
    after 2.6.32. The largest portion of this overhead was shown by
    oprofile to be at an mfence introduced by this commit into the page
    allocator hot path.

    For extra style points, the commit introduced the use of yield() in an
    implementation of what looks like a spinning mutex.

    This patch replaces the full memory barriers on both read and write
    sides with a sequence counter with just read barriers on the fast path
    side. This is much cheaper on some architectures, including x86. The
    main bulk of the patch is the retry logic if the nodemask changes in a
    manner that can cause a false failure.

    While updating the nodemask, a check is made to see if a false failure
    is a risk. If it is, the sequence number gets bumped and parallel
    allocators will briefly stall while the nodemask update takes place.

    In a page fault test microbenchmark, oprofile samples from
    __alloc_pages_nodemask went from 4.53% of all samples to 1.15%. The
    actual results were

    3.3.0-rc3 3.3.0-rc3
    rc3-vanilla nobarrier-v2r1
    Clients 1 UserTime 0.07 ( 0.00%) 0.08 (-14.19%)
    Clients 2 UserTime 0.07 ( 0.00%) 0.07 ( 2.72%)
    Clients 4 UserTime 0.08 ( 0.00%) 0.07 ( 3.29%)
    Clients 1 SysTime 0.70 ( 0.00%) 0.65 ( 6.65%)
    Clients 2 SysTime 0.85 ( 0.00%) 0.82 ( 3.65%)
    Clients 4 SysTime 1.41 ( 0.00%) 1.41 ( 0.32%)
    Clients 1 WallTime 0.77 ( 0.00%) 0.74 ( 4.19%)
    Clients 2 WallTime 0.47 ( 0.00%) 0.45 ( 3.73%)
    Clients 4 WallTime 0.38 ( 0.00%) 0.37 ( 1.58%)
    Clients 1 Flt/sec/cpu 497620.28 ( 0.00%) 520294.53 ( 4.56%)
    Clients 2 Flt/sec/cpu 414639.05 ( 0.00%) 429882.01 ( 3.68%)
    Clients 4 Flt/sec/cpu 257959.16 ( 0.00%) 258761.48 ( 0.31%)
    Clients 1 Flt/sec 495161.39 ( 0.00%) 517292.87 ( 4.47%)
    Clients 2 Flt/sec 820325.95 ( 0.00%) 850289.77 ( 3.65%)
    Clients 4 Flt/sec 1020068.93 ( 0.00%) 1022674.06 ( 0.26%)
    MMTests Statistics: duration
    Sys Time Running Test (seconds) 135.68 132.17
    User+Sys Time Running Test (seconds) 164.2 160.13
    Total Elapsed Time (seconds) 123.46 120.87

    The overall improvement is small but the System CPU time is much
    improved and roughly in correlation to what oprofile reported (these
    performance figures are without profiling so skew is expected). The
    actual number of page faults is noticeably improved.

    For benchmarks like kernel builds, the overall benefit is marginal but
    the system CPU time is slightly reduced.

    To test the actual bug the commit fixed I opened two terminals. The
    first ran within a cpuset and continually ran a small program that
    faulted 100M of anonymous data. In a second window, the nodemask of the
    cpuset was continually randomised in a loop.

    Without the commit, the program would fail every so often (usually
    within 10 seconds) and obviously with the commit everything worked fine.
    With this patch applied, it also worked fine so the fix should be
    functionally equivalent.

    Signed-off-by: Mel Gorman
    Cc: Miao Xie
    Cc: David Rientjes
    Cc: Peter Zijlstra
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • When unmapping a given VM range, we could bail out if a reference page is
    supplied and is unmapped, which is a minor optimization.

    Signed-off-by: Hillf Danton
    Cc: Michal Hocko
    Cc: KAMEZAWA Hiroyuki
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hillf Danton
     
  • When gathering surplus pages, the number of needed pages is recomputed
    after reacquiring hugetlb lock to catch changes in resv_huge_pages and
    free_huge_pages. Plus it is recomputed with the number of newly allocated
    pages involved.

    Thus freeing pages can be deferred a bit to see if the final page request
    is satisfied, though pages could be allocated less than needed.

    Signed-off-by: Hillf Danton
    Reviewed-by: Michal Hocko
    Cc: KAMEZAWA Hiroyuki
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hillf Danton
     

06 Mar, 2012

1 commit

  • All other callers already hold either ->mmap_sem (exclusive) or
    ->page_table_lock. And we need it because some page table flushing
    instanced do work explicitly with ge tables.

    See e.g. arch/powerpc/mm/tlb_hash32.c, flush_tlb_range() and
    flush_range() in there. The same goes for uml, with a lot more
    extensive playing with page tables.

    Almost all callers are actually fine - flush_tlb_range() may have no
    need to bother playing with page tables, but it can do so safely; again,
    this caller is the sole exception - everything else either has exclusive
    ->mmap_sem on the mm in question, or mm->page_table_lock is held.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

24 Jan, 2012

1 commit

  • Page mapcount should be updated only if we are sure that the page ends
    up in the page table otherwise we would leak if we couldn't COW due to
    reservations or if idx is out of bounds.

    Signed-off-by: Hillf Danton
    Reviewed-by: Michal Hocko
    Acked-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hillf Danton
     

11 Jan, 2012

5 commits

  • If we have to hand back the newly allocated huge page to page allocator,
    for any reason, the changed counter should be recovered.

    This affects only s390 at present.

    Signed-off-by: Hillf Danton
    Reviewed-by: Michal Hocko
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hillf Danton
     
  • The computation for pgoff is incorrect, at least with

    (vma->vm_pgoff >> PAGE_SHIFT)

    involved. It is fixed with the available method if HPAGE_SIZE is
    concerned in page cache lookup.

    [akpm@linux-foundation.org: use vma_hugecache_offset() directly, per Michal]
    Signed-off-by: Hillf Danton
    Cc: Mel Gorman
    Cc: Michal Hocko
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Andrea Arcangeli
    Cc: David Rientjes
    Reviewed-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hillf Danton
     
  • handle_mm_fault() passes 'faulted' address to hugetlb_fault(). This
    address is not aligned to a hugepage boundary.

    Most of the functions for hugetlb pages are aware of that and calculate an
    alignment themselves. However some functions such as
    copy_user_huge_page() and clear_huge_page() don't handle alignment by
    themselves.

    This patch make hugeltb_fault() fix the alignment and pass an aligned
    addresss (to address of a faulted hugepage) to functions.

    [akpm@linux-foundation.org: use &=]
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Let's make it clear that we cannot race with other fault handlers due to
    hugetlb (global) mutex. Also make it clear that we want to keep pte_same
    checks anayway to have a transition from the global mutex easier.

    Signed-off-by: Michal Hocko
    Cc: Hillf Danton
    Cc: Andrea Arcangeli
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Currently we are not rechecking pte_same in hugetlb_cow after we take ptl
    lock again in the page allocation failure code path and simply retry
    again. This is not an issue at the moment because hugetlb fault path is
    protected by hugetlb_instantiation_mutex so we cannot race.

    The original page is locked and so we cannot race even with the page
    migration.

    Let's add the pte_same check anyway as we want to be consistent with the
    other check later in this function and be safe if we ever remove the
    mutex.

    [mhocko@suse.cz: reworded the changelog]
    Signed-off-by: Hillf Danton
    Signed-off-by: Michal Hocko
    Cc: Andrea Arcangeli
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hillf Danton
     

07 Jan, 2012

1 commit

  • This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
    and it fixes the build error in the arch/x86/kernel/microcode_core.c
    file, that the merge did not catch.

    The microcode_core.c patch was provided by Stephen Rothwell
    who was invaluable in the merge issues involved
    with the large sysdev removal process in the driver-core tree.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

30 Dec, 2011

1 commit


22 Dec, 2011

1 commit

  • This moves the 'memory sysdev_class' over to a regular 'memory' subsystem
    and converts the devices to regular devices. The sysdev drivers are
    implemented as subsystem interfaces now.

    After all sysdev classes are ported to regular driver core entities, the
    sysdev implementation will be entirely removed from the kernel.

    Signed-off-by: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

09 Dec, 2011

1 commit

  • Commit 70b50f94f1644 ("mm: thp: tail page refcounting fix") keeps all
    page_tail->_count zero at all times. But the current kernel does not
    set page_tail->_count to zero if a 1GB page is utilized. So when an
    IOMMU 1GB page is used by KVM, it wil result in a kernel oops because a
    tail page's _count does not equal zero.

    kernel BUG at include/linux/mm.h:386!
    invalid opcode: 0000 [#1] SMP
    Call Trace:
    gup_pud_range+0xb8/0x19d
    get_user_pages_fast+0xcb/0x192
    ? trace_hardirqs_off+0xd/0xf
    hva_to_pfn+0x119/0x2f2
    gfn_to_pfn_memslot+0x2c/0x2e
    kvm_iommu_map_pages+0xfd/0x1c1
    kvm_iommu_map_memslots+0x7c/0xbd
    kvm_iommu_map_guest+0xaa/0xbf
    kvm_vm_ioctl_assigned_device+0x2ef/0xa47
    kvm_vm_ioctl+0x36c/0x3a2
    do_vfs_ioctl+0x49e/0x4e4
    sys_ioctl+0x5a/0x7c
    system_call_fastpath+0x16/0x1b
    RIP gup_huge_pud+0xf2/0x159

    Signed-off-by: Youquan Song
    Reviewed-by: Andrea Arcangeli
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Youquan Song
     

16 Nov, 2011

1 commit


26 Jul, 2011

2 commits


16 Jun, 2011

1 commit

  • When 1GB hugepages are allocated on a system, free(1) reports less
    available memory than what really is installed in the box. Also, if the
    total size of hugepages allocated on a system is over half of the total
    memory size, CommitLimit becomes a negative number.

    The problem is that gigantic hugepages (order > MAX_ORDER) can only be
    allocated at boot with bootmem, thus its frames are not accounted to
    'totalram_pages'. However, they are accounted to hugetlb_total_pages()

    What happens to turn CommitLimit into a negative number is this
    calculation, in fs/proc/meminfo.c:

    allowed = ((totalram_pages - hugetlb_total_pages())
    * sysctl_overcommit_ratio / 100) + total_swap_pages;

    A similar calculation occurs in __vm_enough_memory() in mm/mmap.c.

    Also, every vm statistic which depends on 'totalram_pages' will render
    confusing values, as if system were 'missing' some part of its memory.

    Impact of this bug:

    When gigantic hugepages are allocated and sysctl_overcommit_memory ==
    OVERCOMMIT_NEVER. In a such situation, __vm_enough_memory() goes through
    the mentioned 'allowed' calculation and might end up mistakenly returning
    -ENOMEM, thus forcing the system to start reclaiming pages earlier than it
    would be ususal, and this could cause detrimental impact to overall
    system's performance, depending on the workload.

    Besides the aforementioned scenario, I can only think of this causing
    annoyances with memory reports from /proc/meminfo and free(1).

    [akpm@linux-foundation.org: standardize comment layout]
    Reported-by: Russ Anderson
    Signed-off-by: Rafael Aquini
    Acked-by: Russ Anderson
    Cc: Andrea Arcangeli
    Cc: Christoph Lameter
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     

06 Jun, 2011

1 commit

  • Al Viro observes that in the hugetlb case, handle_mm_fault() may return
    a value of the kind ENOSPC when its caller is expecting a value of the
    kind VM_FAULT_SIGBUS: fix alloc_huge_page()'s failure returns.

    Signed-off-by: Hugh Dickins
    Acked-by: Al Viro
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

27 May, 2011

1 commit

  • The type of vma->vm_flags is 'unsigned long'. Neither 'int' nor
    'unsigned int'. This patch fixes such misuse.

    Signed-off-by: KOSAKI Motohiro
    [ Changed to use a typedef - we'll extend it to cover more cases
    later, since there has been discussion about making it a 64-bit
    type.. - Linus ]
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     

25 May, 2011

1 commit

  • Straightforward conversion of i_mmap_lock to a mutex.

    Signed-off-by: Peter Zijlstra
    Acked-by: Hugh Dickins
    Cc: Benjamin Herrenschmidt
    Cc: David Miller
    Cc: Martin Schwidefsky
    Cc: Russell King
    Cc: Paul Mundt
    Cc: Jeff Dike
    Cc: Richard Weinberger
    Cc: Tony Luck
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Cc: KOSAKI Motohiro
    Cc: Nick Piggin
    Cc: Namhyung Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

26 Apr, 2011

1 commit


10 Apr, 2011

1 commit


31 Mar, 2011

1 commit


23 Mar, 2011

1 commit

  • When the user inserts a negative value into /proc/sys/vm/nr_hugepages it
    will cause the kernel to allocate as many hugepages as possible and to
    then update /proc/meminfo to reflect this.

    This changes the behavior so that the negative input will result in
    nr_hugepages value being unchanged.

    Signed-off-by: Petr Holasek
    Signed-off-by: Anton Arapov
    Reviewed-by: Naoya Horiguchi
    Acked-by: David Rientjes
    Acked-by: Mel Gorman
    Acked-by: Eric B Munson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Petr Holasek
     

14 Jan, 2011

5 commits

  • When parsing changes to the huge page pool sizes made from userspace via
    the sysfs interface, bogus input values are being covered up by
    nr_hugepages_store_common and nr_overcommit_hugepages_store returning 0
    when strict_strtoul returns an error. This can cause an infinite loop in
    the nr_hugepages_store code. This patch changes the return value for
    these functions to -EINVAL when strict_strtoul returns an error.

    Signed-off-by: Eric B Munson
    Reported-by: CAI Qian
    Cc: Andrea Arcangeli
    Cc: Eric B Munson
    Cc: Michal Hocko
    Cc: Nishanth Aravamudan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric B Munson
     
  • Huge pages with order >= MAX_ORDER must be allocated at boot via the
    kernel command line, they cannot be allocated or freed once the kernel is
    up and running. Currently we allow values to be written to the sysfs and
    sysctl files controling pool size for these huge page sizes. This patch
    makes the store functions for nr_hugepages and nr_overcommit_hugepages
    return -EINVAL when the pool for a page size >= MAX_ORDER is changed.

    [akpm@linux-foundation.org: avoid multiple return paths in nr_hugepages_store_common()]
    [caiqian@redhat.com: add checking in hugetlb_overcommit_handler()]
    Signed-off-by: Eric B Munson
    Reported-by: CAI Qian
    Cc: Andrea Arcangeli
    Cc: Michal Hocko
    Cc: Nishanth Aravamudan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric B Munson
     
  • proc_doulongvec_minmax may fail if the given buffer doesn't represent a
    valid number. If we provide something invalid we will initialize the
    resulting value (nr_overcommit_huge_pages in this case) to a random value
    from the stack.

    The issue was introduced by a3d0c6aa when the default handler has been
    replaced by the helper function where we do not check the return value.

    Reproducer:
    echo "" > /proc/sys/vm/nr_overcommit_hugepages

    [akpm@linux-foundation.org: correctly propagate proc_doulongvec_minmax return code]
    Signed-off-by: Michal Hocko
    Cc: CAI Qian
    Cc: Nishanth Aravamudan
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • The NODEMASK_ALLOC macro may dynamically allocate memory for its second
    argument ('nodes_allowed' in this context).

    In nr_hugepages_store_common() we may abort early if strict_strtoul()
    fails, but in that case we do not free the memory already allocated to
    'nodes_allowed', causing a memory leak.

    This patch closes the leak by freeing the memory in the error path.

    [akpm@linux-foundation.org: use NODEMASK_FREE, per Minchan Kim]
    Signed-off-by: Jesper Juhl
    Cc: Minchan Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • Move the copy/clear_huge_page functions to common code to share between
    hugetlb.c and huge_memory.c.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Rik van Riel
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

03 Dec, 2010

1 commit

  • Have hugetlb_fault() call unlock_page(page) only if it had previously
    called lock_page(page).

    Setting CONFIG_DEBUG_VM=y and then running the libhugetlbfs test suite,
    resulted in the tripping of VM_BUG_ON(!PageLocked(page)) in
    unlock_page() having been called by hugetlb_fault() when page ==
    pagecache_page. This patch remedied the problem.

    Signed-off-by: Dean Nelson
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dean Nelson