Doug / smarc-fsl-linux-kernel | Embedian Git Server

30 May, 2012

3 commits

4523e1458 mm: fix vma_resv_map() NULL pointer ... Browse Code »

hugetlb_reserve_pages() can be used for either normal file-backed
hugetlbfs mappings, or MAP_HUGETLB. In the MAP_HUGETLB, semi-anonymous
mode, there is not a VMA around. The new call to resv_map_put() assumed
that there was, and resulted in a NULL pointer dereference:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: vma_resv_map+0x9/0x30
PGD 141453067 PUD 1421e1067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
...
Pid: 14006, comm: trinity-child6 Not tainted 3.4.0+ #36
RIP: vma_resv_map+0x9/0x30
...
Process trinity-child6 (pid: 14006, threadinfo ffff8801414e0000, task ffff8801414f26b0)
Call Trace:
resv_map_put+0xe/0x40
hugetlb_reserve_pages+0xa6/0x1d0
hugetlb_file_setup+0x102/0x2c0
newseg+0x115/0x360
ipcget+0x1ce/0x310
sys_shmget+0x5a/0x60
system_call_fastpath+0x16/0x1b

This was reported by Dave Jones, but was reproducible with the
libhugetlbfs test cases, so shame on me for not running them in the
first place.

With this, the oops is gone, and the output of libhugetlbfs's
run_tests.py is identical to plain 3.4 again.

[ Marked for stable, since this was introduced by commit c50ac050811d
("hugetlb: fix resv_map leak in error path") which was also marked for
stable ]

Reported-by: Dave Jones
Cc: Mel Gorman
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: Andrea Arcangeli
Cc: Andrew Morton
Cc: [2.6.32+]
Signed-off-by: Linus Torvalds

Dave Hansen
2012-05-30 23:48:13 +0800
c50ac0508 hugetlb: fix resv_map leak in error path ... Browse Code »

When called for anonymous (non-shared) mappings, hugetlb_reserve_pages()
does a resv_map_alloc(). It depends on code in hugetlbfs's
vm_ops->close() to release that allocation.

However, in the mmap() failure path, we do a plain unmap_region() without
the remove_vma() which actually calls vm_ops->close().

This is a decent fix. This leak could get reintroduced if new code (say,
after hugetlb_reserve_pages() in hugetlbfs_file_mmap()) decides to return
an error. But, I think it would have to unroll the reservation anyway.

Christoph's test case:

http://marc.info/?l=linux-mm&m=133728900729735

This patch applies to 3.4 and later. A version for earlier kernels is at
https://lkml.org/lkml/2012/5/22/418.

Signed-off-by: Dave Hansen
Acked-by: Mel Gorman
Acked-by: KOSAKI Motohiro
Reported-by: Christoph Lameter
Tested-by: Christoph Lameter
Cc: Andrea Arcangeli
Cc: [2.6.32+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Hansen
2012-05-30 07:22:24 +0800
f2135a4a5 mm/hugetlb.c: use long vars instead of int in region_count() ... Browse Code »

The arguments f & t and fields from & to of struct file_region are
defined as long. So use long instead of int to type the temp vars.

Signed-off-by: Wang Sheng-Hui
Acked-by: David Rientjes
Acked-by: Hillf Danton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wang Sheng-Hui
2012-05-30 07:22:18 +0800

26 May, 2012

1 commit

d9ed9faac mm: add new arch_make_huge_pte() method for tile support ... Browse Code »

The tile support for multiple-size huge pages requires tagging
the hugetlb PTE with a "super" bit for PTEs that are multiples of
the basic size of a pagetable span. To set that bit properly
we need to tweak the PTe in make_huge_pte() based on the vma.

This change provides the API for a subsequent tile-specific
change to use.

Reviewed-by: Hillf Danton
Signed-off-by: Chris Metcalf

Chris Metcalf
2012-05-26 00:48:26 +0800

11 May, 2012

1 commit

4998a6c0e hugetlb: prevent BUG_ON in hugetlb_fault() -> hugetlb_cow() ... Browse Code »

Commit 66aebce747eaf ("hugetlb: fix race condition in hugetlb_fault()")
added code to avoid a race condition by elevating the page refcount in
hugetlb_fault() while calling hugetlb_cow().

However, one code path in hugetlb_cow() includes an assertion that the
page count is 1, whereas it may now also have the value 2 in this path.

The consensus is that this BUG_ON has served its purpose, so rather than
extending it to cover both cases, we just remove it.

Signed-off-by: Chris Metcalf
Acked-by: Mel Gorman
Acked-by: Hillf Danton
Acked-by: Hugh Dickins
Cc: Michal Hocko
Cc: KAMEZAWA Hiroyuki
Cc: [3.0.29+, 3.2.16+, 3.3.3+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chris Metcalf
2012-05-11 06:06:44 +0800

26 Apr, 2012

1 commit

b1c12cbcd mm/hugetlb: fix warning in alloc_huge_page/dequeue_huge_page_vma ... Browse Code »

Fix a gcc warning (and bug?) introduced in cc9a6c877 ("cpuset: mm: reduce
large amounts of memory barrier related damage v3")

Local variable "page" can be uninitialized if the nodemask from vma policy
does not intersects with nodemask from cpuset. Even if it doesn't happens
it is better to initialize this variable explicitly than to introduce
a kernel oops in a weird corner case.

mm/hugetlb.c: In function `alloc_huge_page':
mm/hugetlb.c:1135:5: warning: `page' may be used uninitialized in this function

Signed-off-by: Konstantin Khlebnikov
Acked-by: Mel Gorman
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2012-04-26 12:26:33 +0800

13 Apr, 2012

1 commit

66aebce74 hugetlb: fix race condition in hugetlb_fault() ... Browse Code »

The race is as follows:

Suppose a multi-threaded task forks a new process (on cpu A), thus
bumping up the ref count on all the pages. While the fork is occurring
(and thus we have marked all the PTEs as read-only), another thread in
the original process (on cpu B) tries to write to a huge page, taking an
access violation from the write-protect and calling hugetlb_cow(). Now,
suppose the fork() fails. It will undo the COW and decrement the ref
count on the pages, so the ref count on the huge page drops back to 1.
Meanwhile hugetlb_cow() also decrements the ref count by one on the
original page, since the original address space doesn't need it any
more, having copied a new page to replace the original page. This
leaves the ref count at zero, and when we call unlock_page(), we panic.

fork on CPU A fault on CPU B
============= ==============
...
down_write(&parent->mmap_sem);
down_write_nested(&child->mmap_sem);
...
while duplicating vmas
if error
break;
...
up_write(&child->mmap_sem);
up_write(&parent->mmap_sem); ...
down_read(&parent->mmap_sem);
...
lock_page(page);
handle COW
page_mapcount(old_page) == 2
alloc and prepare new_page
...
handle error
page_remove_rmap(page);
put_page(page);
...
fold new_page into pte
page_remove_rmap(page);
put_page(page);
...
oops ==> unlock_page(page);
up_read(&parent->mmap_sem);

The solution is to take an extra reference to the page while we are
holding the lock on it.

Signed-off-by: Chris Metcalf
Cc: Hillf Danton
Cc: Michal Hocko
Cc: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chris Metcalf
2012-04-13 04:12:12 +0800

24 Mar, 2012

1 commit

6629326b8 mm: hugetlb: cleanup duplicated code in unmapping vm range ... Browse Code »

Fix code duplication in __unmap_hugepage_range(), such as pte_page() and
huge_pte_none().

Signed-off-by: Hillf Danton
Cc: Michal Hocko
Cc: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hillf Danton
2012-03-24 07:58:31 +0800

22 Mar, 2012

4 commits

90481622d hugepages: fix use after free bug in "quota" handling ... Browse Code »

hugetlbfs_{get,put}_quota() are badly named. They don't interact with the
general quota handling code, and they don't much resemble its behaviour.
Rather than being about maintaining limits on on-disk block usage by
particular users, they are instead about maintaining limits on in-memory
page usage (including anonymous MAP_PRIVATE copied-on-write pages)
associated with a particular hugetlbfs filesystem instance.

Worse, they work by having callbacks to the hugetlbfs filesystem code from
the low-level page handling code, in particular from free_huge_page().
This is a layering violation of itself, but more importantly, if the
kernel does a get_user_pages() on hugepages (which can happen from KVM
amongst others), then the free_huge_page() can be delayed until after the
associated inode has already been freed. If an unmount occurs at the
wrong time, even the hugetlbfs superblock where the "quota" limits are
stored may have been freed.

Andrew Barry proposed a patch to fix this by having hugepages, instead of
storing a pointer to their address_space and reaching the superblock from
there, had the hugepages store pointers directly to the superblock,
bumping the reference count as appropriate to avoid it being freed.
Andrew Morton rejected that version, however, on the grounds that it made
the existing layering violation worse.

This is a reworked version of Andrew's patch, which removes the extra, and
some of the existing, layering violation. It works by introducing the
concept of a hugepage "subpool" at the lower hugepage mm layer - that is a
finite logical pool of hugepages to allocate from. hugetlbfs now creates
a subpool for each filesystem instance with a page limit set, and a
pointer to the subpool gets added to each allocated hugepage, instead of
the address_space pointer used now. The subpool has its own lifetime and
is only freed once all pages in it _and_ all other references to it (i.e.
superblocks) are gone.

subpools are optional - a NULL subpool pointer is taken by the code to
mean that no subpool limits are in effect.

Previous discussion of this bug found in: "Fix refcounting in hugetlbfs
quota handling.". See: https://lkml.org/lkml/2011/8/11/28 or
http://marc.info/?l=linux-mm&m=126928970510627&w=1

v2: Fixed a bug spotted by Hillf Danton, and removed the extra parameter to
alloc_huge_page() - since it already takes the vma, it is not necessary.

Signed-off-by: Andrew Barry
Signed-off-by: David Gibson
Cc: Hugh Dickins
Cc: Mel Gorman
Cc: Minchan Kim
Cc: Hillf Danton
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Gibson
2012-03-22 08:54:59 +0800
cc9a6c877 cpuset: mm: reduce large amounts of memory barrier related damage v3 ... Browse Code »

Commit c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when
changing cpuset's mems") wins a super prize for the largest number of
memory barriers entered into fast paths for one commit.

[get|put]_mems_allowed is incredibly heavy with pairs of full memory
barriers inserted into a number of hot paths. This was detected while
investigating at large page allocator slowdown introduced some time
after 2.6.32. The largest portion of this overhead was shown by
oprofile to be at an mfence introduced by this commit into the page
allocator hot path.

For extra style points, the commit introduced the use of yield() in an
implementation of what looks like a spinning mutex.

This patch replaces the full memory barriers on both read and write
sides with a sequence counter with just read barriers on the fast path
side. This is much cheaper on some architectures, including x86. The
main bulk of the patch is the retry logic if the nodemask changes in a
manner that can cause a false failure.

While updating the nodemask, a check is made to see if a false failure
is a risk. If it is, the sequence number gets bumped and parallel
allocators will briefly stall while the nodemask update takes place.

In a page fault test microbenchmark, oprofile samples from
__alloc_pages_nodemask went from 4.53% of all samples to 1.15%. The
actual results were

3.3.0-rc3 3.3.0-rc3
rc3-vanilla nobarrier-v2r1
Clients 1 UserTime 0.07 ( 0.00%) 0.08 (-14.19%)
Clients 2 UserTime 0.07 ( 0.00%) 0.07 ( 2.72%)
Clients 4 UserTime 0.08 ( 0.00%) 0.07 ( 3.29%)
Clients 1 SysTime 0.70 ( 0.00%) 0.65 ( 6.65%)
Clients 2 SysTime 0.85 ( 0.00%) 0.82 ( 3.65%)
Clients 4 SysTime 1.41 ( 0.00%) 1.41 ( 0.32%)
Clients 1 WallTime 0.77 ( 0.00%) 0.74 ( 4.19%)
Clients 2 WallTime 0.47 ( 0.00%) 0.45 ( 3.73%)
Clients 4 WallTime 0.38 ( 0.00%) 0.37 ( 1.58%)
Clients 1 Flt/sec/cpu 497620.28 ( 0.00%) 520294.53 ( 4.56%)
Clients 2 Flt/sec/cpu 414639.05 ( 0.00%) 429882.01 ( 3.68%)
Clients 4 Flt/sec/cpu 257959.16 ( 0.00%) 258761.48 ( 0.31%)
Clients 1 Flt/sec 495161.39 ( 0.00%) 517292.87 ( 4.47%)
Clients 2 Flt/sec 820325.95 ( 0.00%) 850289.77 ( 3.65%)
Clients 4 Flt/sec 1020068.93 ( 0.00%) 1022674.06 ( 0.26%)
MMTests Statistics: duration
Sys Time Running Test (seconds) 135.68 132.17
User+Sys Time Running Test (seconds) 164.2 160.13
Total Elapsed Time (seconds) 123.46 120.87

The overall improvement is small but the System CPU time is much
improved and roughly in correlation to what oprofile reported (these
performance figures are without profiling so skew is expected). The
actual number of page faults is noticeably improved.

For benchmarks like kernel builds, the overall benefit is marginal but
the system CPU time is slightly reduced.

To test the actual bug the commit fixed I opened two terminals. The
first ran within a cpuset and continually ran a small program that
faulted 100M of anonymous data. In a second window, the nodemask of the
cpuset was continually randomised in a loop.

Without the commit, the program would fail every so often (usually
within 10 seconds) and obviously with the commit everything worked fine.
With this patch applied, it also worked fine so the fix should be
functionally equivalent.

Signed-off-by: Mel Gorman
Cc: Miao Xie
Cc: David Rientjes
Cc: Peter Zijlstra
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-03-22 08:54:59 +0800
9e81130b7 mm: hugetlb: bail out unmapping after serving reference page ... Browse Code »

When unmapping a given VM range, we could bail out if a reference page is
supplied and is unmapped, which is a minor optimization.

Signed-off-by: Hillf Danton
Cc: Michal Hocko
Cc: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hillf Danton
2012-03-22 08:54:57 +0800
28073b02b mm: hugetlb: defer freeing pages when gathering surplus pages ... Browse Code »

When gathering surplus pages, the number of needed pages is recomputed
after reacquiring hugetlb lock to catch changes in resv_huge_pages and
free_huge_pages. Plus it is recomputed with the number of newly allocated
pages involved.

Thus freeing pages can be deferred a bit to see if the final page request
is satisfied, though pages could be allocated less than needed.

Signed-off-by: Hillf Danton
Reviewed-by: Michal Hocko
Cc: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Cc: Mel Gorman
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hillf Danton
2012-03-22 08:54:57 +0800

06 Mar, 2012

1 commit

cd2934a3b flush_tlb_range() needs ->page_table_lock when ->mmap_sem is not held ... Browse Code »

All other callers already hold either ->mmap_sem (exclusive) or
->page_table_lock. And we need it because some page table flushing
instanced do work explicitly with ge tables.

See e.g. arch/powerpc/mm/tlb_hash32.c, flush_tlb_range() and
flush_range() in there. The same goes for uml, with a lot more
extensive playing with page tables.

Almost all callers are actually fine - flush_tlb_range() may have no
need to bother playing with page tables, but it can do so safely; again,
this caller is the sole exception - everything else either has exclusive
->mmap_sem on the mm in question, or mm->page_table_lock is held.

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2012-03-06 05:51:32 +0800

24 Jan, 2012

1 commit

409eb8c26 mm/hugetlb.c: undo change to page mapcount in fault handler ... Browse Code »

Page mapcount should be updated only if we are sure that the page ends
up in the page table otherwise we would leak if we couldn't COW due to
reservations or if idx is out of bounds.

Signed-off-by: Hillf Danton
Reviewed-by: Michal Hocko
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hillf Danton
2012-01-24 00:38:48 +0800

11 Jan, 2012

5 commits

ea5768c74 mm/hugetlb.c: avoid bogus counter of surplus huge page ... Browse Code »

If we have to hand back the newly allocated huge page to page allocator,
for any reason, the changed counter should be recovered.

This affects only s390 at present.

Signed-off-by: Hillf Danton
Reviewed-by: Michal Hocko
Acked-by: KAMEZAWA Hiroyuki
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hillf Danton
2012-01-11 08:30:45 +0800
0c176d52b mm: hugetlb: fix pgoff computation when unmapping page from vma ... Browse Code »

The computation for pgoff is incorrect, at least with

(vma->vm_pgoff >> PAGE_SHIFT)

involved. It is fixed with the available method if HPAGE_SIZE is
concerned in page cache lookup.

[akpm@linux-foundation.org: use vma_hugecache_offset() directly, per Michal]
Signed-off-by: Hillf Danton
Cc: Mel Gorman
Cc: Michal Hocko
Reviewed-by: KAMEZAWA Hiroyuki
Cc: Andrea Arcangeli
Cc: David Rientjes
Reviewed-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hillf Danton
2012-01-11 08:30:45 +0800
1e16a539a mm/hugetlb.c: fix virtual address handling in hugetlb fault ... Browse Code »

handle_mm_fault() passes 'faulted' address to hugetlb_fault(). This
address is not aligned to a hugepage boundary.

Most of the functions for hugetlb pages are aware of that and calculate an
alignment themselves. However some functions such as
copy_user_huge_page() and clear_huge_page() don't handle alignment by
themselves.

This patch make hugeltb_fault() fix the alignment and pass an aligned
addresss (to address of a faulted hugepage) to functions.

[akpm@linux-foundation.org: use &=]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2012-01-11 08:30:42 +0800
ef009b25f hugetlb: clarify hugetlb_instantiation_mutex usage ... Browse Code »

Let's make it clear that we cannot race with other fault handlers due to
hugetlb (global) mutex. Also make it clear that we want to keep pte_same
checks anayway to have a transition from the global mutex easier.

Signed-off-by: Michal Hocko
Cc: Hillf Danton
Cc: Andrea Arcangeli
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2012-01-11 08:30:42 +0800
a734bcc81 hugetlb: detect race upon page allocation failure during COW ... Browse Code »

Currently we are not rechecking pte_same in hugetlb_cow after we take ptl
lock again in the page allocation failure code path and simply retry
again. This is not an issue at the moment because hugetlb fault path is
protected by hugetlb_instantiation_mutex so we cannot race.

The original page is locked and so we cannot race even with the page
migration.

Let's add the pte_same check anyway as we want to be consistent with the
other check later in this function and be safe if we ever remove the
mutex.

[mhocko@suse.cz: reworded the changelog]
Signed-off-by: Hillf Danton
Signed-off-by: Michal Hocko
Cc: Andrea Arcangeli
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hillf Danton
2012-01-11 08:30:42 +0800

07 Jan, 2012

1 commit

ff4b8a57f Merge branch 'driver-core-next' into Linux 3.2 ... Browse Code »

This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
and it fixes the build error in the arch/x86/kernel/microcode_core.c
file, that the merge did not catch.

The microcode_core.c patch was provided by Stephen Rothwell
who was invaluable in the merge issues involved
with the large sysdev removal process in the driver-core tree.

Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2012-01-07 03:42:52 +0800

30 Dec, 2011

1 commit

b0365c8d0 mm: hugetlb: fix non-atomic enqueue of huge page ... Browse Code »

If a huge page is enqueued under the protection of hugetlb_lock, then the
operation is atomic and safe.

Signed-off-by: Hillf Danton
Reviewed-by: Michal Hocko
Acked-by: KAMEZAWA Hiroyuki
Cc: [2.6.37+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hillf Danton
2011-12-30 08:31:57 +0800

22 Dec, 2011

1 commit

10fbcf4c6 convert 'memory' sysdev_class to a regular subsystem ... Browse Code »

This moves the 'memory sysdev_class' over to a regular 'memory' subsystem
and converts the devices to regular devices. The sysdev drivers are
implemented as subsystem interfaces now.

After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Signed-off-by: Kay Sievers
Signed-off-by: Greg Kroah-Hartman

Kay Sievers
2011-12-22 06:48:43 +0800

09 Dec, 2011

1 commit

58a84aa92 thp: set compound tail page _count to zero ... Browse Code »

Commit 70b50f94f1644 ("mm: thp: tail page refcounting fix") keeps all
page_tail->_count zero at all times. But the current kernel does not
set page_tail->_count to zero if a 1GB page is utilized. So when an
IOMMU 1GB page is used by KVM, it wil result in a kernel oops because a
tail page's _count does not equal zero.

kernel BUG at include/linux/mm.h:386!
invalid opcode: 0000 [#1] SMP
Call Trace:
gup_pud_range+0xb8/0x19d
get_user_pages_fast+0xcb/0x192
? trace_hardirqs_off+0xd/0xf
hva_to_pfn+0x119/0x2f2
gfn_to_pfn_memslot+0x2c/0x2e
kvm_iommu_map_pages+0xfd/0x1c1
kvm_iommu_map_memslots+0x7c/0xbd
kvm_iommu_map_guest+0xaa/0xbf
kvm_vm_ioctl_assigned_device+0x2ef/0xa47
kvm_vm_ioctl+0x36c/0x3a2
do_vfs_ioctl+0x49e/0x4e4
sys_ioctl+0x5a/0x7c
system_call_fastpath+0x16/0x1b
RIP gup_huge_pud+0xf2/0x159

Signed-off-by: Youquan Song
Reviewed-by: Andrea Arcangeli
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Youquan Song
2011-12-09 23:50:28 +0800

16 Nov, 2011

1 commit

ea4039a34 hugetlb: release pages in the error path of hugetlb_cow() ... Browse Code »

If we fail to prepare an anon_vma, the {new, old}_page should be released,
or they will leak.

Signed-off-by: Hillf Danton
Reviewed-by: Andrea Arcangeli
Cc: Hugh Dickins
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hillf Danton
2011-11-16 08:41:52 +0800

26 Jul, 2011

2 commits

32f84528f mm: hugetlb: fix coding style issues ... Browse Code »

Fix coding style issues flagged by checkpatch.pl

Signed-off-by: Chris Forbes
Acked-by: Eric B Munson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chris Forbes
2011-07-26 11:57:09 +0800
ee8f248d2 hugetlb: add phys addr to struct huge_bootmem_page ... Browse Code »

This is needed on HIGHMEM systems - we don't always have a virtual
address so store the physical address and map it in as needed.

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Becky Bruce
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Becky Bruce
2011-07-26 11:57:07 +0800

16 Jun, 2011

1 commit

b0320c7b7 mm: fix negative commitlimit when gigantic hugepages are allocated ... Browse Code »

When 1GB hugepages are allocated on a system, free(1) reports less
available memory than what really is installed in the box. Also, if the
total size of hugepages allocated on a system is over half of the total
memory size, CommitLimit becomes a negative number.

The problem is that gigantic hugepages (order > MAX_ORDER) can only be
allocated at boot with bootmem, thus its frames are not accounted to
'totalram_pages'. However, they are accounted to hugetlb_total_pages()

What happens to turn CommitLimit into a negative number is this
calculation, in fs/proc/meminfo.c:

allowed = ((totalram_pages - hugetlb_total_pages())
* sysctl_overcommit_ratio / 100) + total_swap_pages;

A similar calculation occurs in __vm_enough_memory() in mm/mmap.c.

Also, every vm statistic which depends on 'totalram_pages' will render
confusing values, as if system were 'missing' some part of its memory.

Impact of this bug:

When gigantic hugepages are allocated and sysctl_overcommit_memory ==
OVERCOMMIT_NEVER. In a such situation, __vm_enough_memory() goes through
the mentioned 'allowed' calculation and might end up mistakenly returning
-ENOMEM, thus forcing the system to start reclaiming pages earlier than it
would be ususal, and this could cause detrimental impact to overall
system's performance, depending on the workload.

Besides the aforementioned scenario, I can only think of this causing
annoyances with memory reports from /proc/meminfo and free(1).

[akpm@linux-foundation.org: standardize comment layout]
Reported-by: Russ Anderson
Signed-off-by: Rafael Aquini
Acked-by: Russ Anderson
Cc: Andrea Arcangeli
Cc: Christoph Lameter
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael Aquini
2011-06-16 11:04:01 +0800

06 Jun, 2011

1 commit

e0dcd8a05 mm: fix ENOSPC returned by handle_mm_fault() ... Browse Code »

Al Viro observes that in the hugetlb case, handle_mm_fault() may return
a value of the kind ENOSPC when its caller is expecting a value of the
kind VM_FAULT_SIGBUS: fix alloc_huge_page()'s failure returns.

Signed-off-by: Hugh Dickins
Acked-by: Al Viro
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Hugh Dickins
2011-06-06 17:00:27 +0800

27 May, 2011

1 commit

ca16d140a mm: don't access vm_flags as 'int' ... Browse Code »

The type of vma->vm_flags is 'unsigned long'. Neither 'int' nor
'unsigned int'. This patch fixes such misuse.

Signed-off-by: KOSAKI Motohiro
[ Changed to use a typedef - we'll extend it to cover more cases
later, since there has been discussion about making it a 64-bit
type.. - Linus ]
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2011-05-27 00:20:31 +0800

25 May, 2011

1 commit

3d48ae45e mm: Convert i_mmap_lock to a mutex ... Browse Code »

Straightforward conversion of i_mmap_lock to a mutex.

Signed-off-by: Peter Zijlstra
Acked-by: Hugh Dickins
Cc: Benjamin Herrenschmidt
Cc: David Miller
Cc: Martin Schwidefsky
Cc: Russell King
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Richard Weinberger
Cc: Tony Luck
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Cc: KOSAKI Motohiro
Cc: Nick Piggin
Cc: Namhyung Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2011-05-25 23:39:18 +0800

26 Apr, 2011

1 commit

07f9479a4 Merge branch 'master' into for-next ... Browse Code »

Fast-forwarded to current state of Linus' tree as there are patches to be
applied for files that didn't exist on the old branch.

Jiri Kosina
2011-04-26 16:22:59 +0800

10 Apr, 2011

1 commit

6eab04a87 treewide: remove extra semicolons ... Browse Code »

Signed-off-by: Justin P. Mattock
Signed-off-by: Jiri Kosina

Justin P. Mattock
2011-04-10 23:01:05 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

23 Mar, 2011

1 commit

c033a93c0 hugetlbfs: correct handling of negative input to /proc/sys/vm/nr_hugepages ... Browse Code »

When the user inserts a negative value into /proc/sys/vm/nr_hugepages it
will cause the kernel to allocate as many hugepages as possible and to
then update /proc/meminfo to reflect this.

This changes the behavior so that the negative input will result in
nr_hugepages value being unchanged.

Signed-off-by: Petr Holasek
Signed-off-by: Anton Arapov
Reviewed-by: Naoya Horiguchi
Acked-by: David Rientjes
Acked-by: Mel Gorman
Acked-by: Eric B Munson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Petr Holasek
2011-03-23 08:44:04 +0800

14 Jan, 2011

5 commits

73ae31e59 hugetlb: fix handling of parse errors in sysfs ... Browse Code »

When parsing changes to the huge page pool sizes made from userspace via
the sysfs interface, bogus input values are being covered up by
nr_hugepages_store_common and nr_overcommit_hugepages_store returning 0
when strict_strtoul returns an error. This can cause an infinite loop in
the nr_hugepages_store code. This patch changes the return value for
these functions to -EINVAL when strict_strtoul returns an error.

Signed-off-by: Eric B Munson
Reported-by: CAI Qian
Cc: Andrea Arcangeli
Cc: Eric B Munson
Cc: Michal Hocko
Cc: Nishanth Aravamudan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric B Munson
2011-01-14 09:32:49 +0800
adbe8726d hugetlb: do not allow pagesize >= MAX_ORDER pool adjustment ... Browse Code »

Huge pages with order >= MAX_ORDER must be allocated at boot via the
kernel command line, they cannot be allocated or freed once the kernel is
up and running. Currently we allow values to be written to the sysfs and
sysctl files controling pool size for these huge page sizes. This patch
makes the store functions for nr_hugepages and nr_overcommit_hugepages
return -EINVAL when the pool for a page size >= MAX_ORDER is changed.

[akpm@linux-foundation.org: avoid multiple return paths in nr_hugepages_store_common()]
[caiqian@redhat.com: add checking in hugetlb_overcommit_handler()]
Signed-off-by: Eric B Munson
Reported-by: CAI Qian
Cc: Andrea Arcangeli
Cc: Michal Hocko
Cc: Nishanth Aravamudan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric B Munson
2011-01-14 09:32:49 +0800
08d4a2465 hugetlb: check the return value of string conversion in sysctl handler ... Browse Code »

proc_doulongvec_minmax may fail if the given buffer doesn't represent a
valid number. If we provide something invalid we will initialize the
resulting value (nr_overcommit_huge_pages in this case) to a random value
from the stack.

The issue was introduced by a3d0c6aa when the default handler has been
replaced by the helper function where we do not check the return value.

Reproducer:
echo "" > /proc/sys/vm/nr_overcommit_hugepages

[akpm@linux-foundation.org: correctly propagate proc_doulongvec_minmax return code]
Signed-off-by: Michal Hocko
Cc: CAI Qian
Cc: Nishanth Aravamudan
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-01-14 09:32:49 +0800
32d6feadf mm/hugetlb.c: fix error-path memory leak in nr_hugepages_store_common() ... Browse Code »

The NODEMASK_ALLOC macro may dynamically allocate memory for its second
argument ('nodes_allowed' in this context).

In nr_hugepages_store_common() we may abort early if strict_strtoul()
fails, but in that case we do not free the memory already allocated to
'nodes_allowed', causing a memory leak.

This patch closes the leak by freeing the memory in the error path.

[akpm@linux-foundation.org: use NODEMASK_FREE, per Minchan Kim]
Signed-off-by: Jesper Juhl
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesper Juhl
2011-01-14 09:32:48 +0800
47ad8475c thp: clear_copy_huge_page ... Browse Code »

Move the copy/clear_huge_page functions to common code to share between
hugetlb.c and huge_memory.c.

Signed-off-by: Andrea Arcangeli
Acked-by: Rik van Riel
Acked-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-01-14 09:32:41 +0800

03 Dec, 2010

1 commit

1f64d69c7 mm/hugetlb.c: avoid double unlock_page() in hugetlb_fault() ... Browse Code »

Have hugetlb_fault() call unlock_page(page) only if it had previously
called lock_page(page).

Setting CONFIG_DEBUG_VM=y and then running the libhugetlbfs test suite,
resulted in the tripping of VM_BUG_ON(!PageLocked(page)) in
unlock_page() having been called by hugetlb_fault() when page ==
pagecache_page. This patch remedied the problem.

Signed-off-by: Dean Nelson
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dean Nelson
2010-12-03 06:51:14 +0800