23 Oct, 2015
1 commit
-
commit 2f84a8990ebbe235c59716896e017c6b2ca1200f upstream.
SunDong reported the following on
https://bugzilla.kernel.org/show_bug.cgi?id=103841
I think I find a linux bug, I have the test cases is constructed. I
can stable recurring problems in fedora22(4.0.4) kernel version,
arch for x86_64. I construct transparent huge page, when the parent
and child process with MAP_SHARE, MAP_PRIVATE way to access the same
huge page area, it has the opportunity to lead to huge page copy on
write failure, and then it will munmap the child corresponding mmap
area, but then the child mmap area with VM_MAYSHARE attributes, child
process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags
functions (vma - > vm_flags & VM_MAYSHARE).There were a number of problems with the report (e.g. it's hugetlbfs that
triggers this, not transparent huge pages) but it was fundamentally
correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that
looks like thisvma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000
next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800
prot 8000000000000027 anon_vma (null) vm_ops ffffffff8182a7a0
pgoff 0 file ffff88106bdb9800 private_data (null)
flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb)
------------
kernel BUG at mm/hugetlb.c:462!
SMP
Modules linked in: xt_pkttype xt_LOG xt_limit [..]
CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1
Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012
set_vma_resv_flags+0x2d/0x30The VM_BUG_ON is correct because private and shared mappings have
different reservation accounting but the warning clearly shows that the
VMA is shared.When a private COW fails to allocate a new page then only the process
that created the VMA gets the page -- all the children unmap the page.
If the children access that data in the future then they get killed.The problem is that the same file is mapped shared and private. During
the COW, the allocation fails, the VMAs are traversed to unmap the other
private pages but a shared VMA is found and the bug is triggered. This
patch identifies such VMAs and skips them.Signed-off-by: Mel Gorman
Reported-by: SunDong
Reviewed-by: Michal Hocko
Cc: Andrea Arcangeli
Cc: Hugh Dickins
Cc: Naoya Horiguchi
Cc: David Rientjes
Reviewed-by: Naoya Horiguchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman
04 Aug, 2015
1 commit
-
commit 641844f5616d7c6597309f560838f996466d7aac upstream.
Currently the initial value of order in dissolve_free_huge_page is 64 or
32, which leads to the following warning in static checker:mm/hugetlb.c:1203 dissolve_free_huge_pages()
warn: potential right shift more than type allows '9,18,64'This is a potential risk of infinite loop, because 1 << order (== 0) is used
in for-loop like this:for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D 1 << order)
...So this patch fixes it by using global minimum_order calculated at boot time.
text data bss dec hex filename
28313 469 84236 113018 1b97a mm/hugetlb.o
28256 473 84236 112965 1b945 mm/hugetlb.o (patched)Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Reported-by: Dan Carpenter
Signed-off-by: Naoya Horiguchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
Signed-off-by: Greg Kroah-Hartman
16 Apr, 2015
5 commits
-
Now we have an easy access to hugepages' activeness, so existing helpers to
get the information can be cleaned up.[akpm@linux-foundation.org: s/PageHugeActive/page_huge_active/]
Signed-off-by: Naoya Horiguchi
Cc: Hugh Dickins
Reviewed-by: Michal Hocko
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We are not safe from calling isolate_huge_page() on a hugepage
concurrently, which can make the victim hugepage in invalid state and
results in BUG_ON().The root problem of this is that we don't have any information on struct
page (so easily accessible) about hugepages' activeness. Note that
hugepages' activeness means just being linked to
hstate->hugepage_activelist, which is not the same as normal pages'
activeness represented by PageActive flag.Normal pages are isolated by isolate_lru_page() which prechecks PageLRU
before isolation, so let's do similarly for hugetlb with a new
paeg_huge_active().set/clear_page_huge_active() should be called within hugetlb_lock. But
hugetlb_cow() and hugetlb_no_page() don't do this, being justified because
in these functions set_page_huge_active() is called right after the
hugepage is allocated and no other thread tries to isolate it.[akpm@linux-foundation.org: s/PageHugeActive/page_huge_active/, make it return bool]
[fengguang.wu@intel.com: set_page_huge_active() can be static]
Signed-off-by: Naoya Horiguchi
Cc: Hugh Dickins
Reviewed-by: Michal Hocko
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: David Rientjes
Signed-off-by: Fengguang Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make 'min_size=' be an option when mounting a hugetlbfs. This
option takes the same value as the 'size' option. min_size can be
specified without specifying size. If both are specified, min_size must
be less that or equal to size else the mount will fail. If min_size is
specified, then at mount time an attempt is made to reserve min_size
pages. If the reservation fails, the mount fails. At umount time, the
reserved pages are released.Signed-off-by: Mike Kravetz
Cc: Davidlohr Bueso
Cc: Aneesh Kumar
Cc: Joonsoo Kim
Cc: Andi Kleen
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The same routines that perform subpool maximum size accounting
hugepage_subpool_get/put_pages() are modified to also perform minimum size
accounting. When a delta value is passed to these routines, calculate how
global reservations must be adjusted to maintain the subpool minimum size.
The routines now return this global reserve count adjustment. This
global reserve count adjustment is then passed to the global accounting
routine hugetlb_acct_memory().Signed-off-by: Mike Kravetz
Cc: Davidlohr Bueso
Cc: Aneesh Kumar
Cc: Joonsoo Kim
Cc: Andi Kleen
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
hugetlbfs allocates huge pages from the global pool as needed. Even if
the global pool contains a sufficient number pages for the filesystem size
at mount time, those global pages could be grabbed for some other use. As
a result, filesystem huge page allocations may fail due to lack of pages.Applications such as a database want to use huge pages for performance
reasons. hugetlbfs filesystem semantics with ownership and modes work
well to manage access to a pool of huge pages. However, the application
would like some reasonable assurance that allocations will not fail due to
a lack of huge pages. At application startup time, the application would
like to configure itself to use a specific number of huge pages. Before
starting, the application can check to make sure that enough huge pages
exist in the system global pools. However, there are no guarantees that
those pages will be available when needed by the application. What the
application wants is exclusive use of a subset of huge pages.Add a new hugetlbfs mount option 'min_size=' to indicate that the
specified number of pages will be available for use by the filesystem. At
mount time, this number of huge pages will be reserved for exclusive use
of the filesystem. If there is not a sufficient number of free pages, the
mount will fail. As pages are allocated to and freeed from the
filesystem, the number of reserved pages is adjusted so that the specified
minimum is maintained.This patch (of 4):
Add a field to the subpool structure to indicate the minimimum number of
huge pages to always be used by this subpool. This minimum count includes
allocated pages as well as reserved pages. If the minimum number of pages
for the subpool have not been allocated, pages are reserved up to this
minimum. An additional field (rsv_hpages) is used to track the number of
pages reserved to meet this minimum size. The hstate pointer in the
subpool is convenient to have when reserving and unreserving the pages.Signed-off-by: Mike Kravetz
Cc: Davidlohr Bueso
Cc: Aneesh Kumar
Cc: Joonsoo Kim
Cc: Andi Kleen
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Apr, 2015
2 commits
-
If __get_user_pages() is faulting a significant number of hugetlb pages,
usually as the result of mmap(MAP_LOCKED), it can potentially allocate a
very large amount of memory.If the process has been oom killed, this will cause a lot of memory to
potentially deplete memory reserves.In the same way that commit 4779280d1ea4 ("mm: make get_user_pages()
interruptible") aborted for pending SIGKILLs when faulting non-hugetlb
memory, based on the premise of commit 462e00cc7151 ("oom: stop
allocating user memory if TIF_MEMDIE is set"), hugetlb page faults now
terminate when the process has been oom killed.Signed-off-by: David Rientjes
Acked-by: Rik van Riel
Acked-by: Greg Thelen
Cc: Naoya Horiguchi
Acked-by: Davidlohr Bueso
Acked-by: "Kirill A. Shutemov"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit 61f77eda9bbf ("mm/hugetlb: reduce arch dependent code around
follow_huge_*") broke follow_huge_pmd() on s390, where pmd and pte
layout differ and using pte_page() on a huge pmd will return wrong
results. Using pmd_page() instead fixes this.All architectures that were touched by that commit have pmd_page()
defined, so this should not break anything on other architectures.Fixes: 61f77eda "mm/hugetlb: reduce arch dependent code around follow_huge_*"
Signed-off-by: Gerald Schaefer
Acked-by: Naoya Horiguchi
Cc: Hugh Dickins
Cc: Michal Hocko , Andrea Arcangeli
Cc: Martin Schwidefsky
Acked-by: David Rientjes
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Mar, 2015
1 commit
-
Now that gigantic pages are dynamically allocatable, care must be taken to
ensure that p->first_page is valid before setting PageTail.If this isn't done, then it is possible to race and have compound_head()
return NULL.Signed-off-by: David Rientjes
Acked-by: Davidlohr Bueso
Cc: Luiz Capitulino
Cc: Joonsoo Kim
Acked-by: Hillf Danton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Feb, 2015
7 commits
-
Dave noticed that unprivileged process can allocate significant amount of
memory -- >500 MiB on x86_64 -- and stay unnoticed by oom-killer and
memory cgroup. The trick is to allocate a lot of PMD page tables. Linux
kernel doesn't account PMD tables to the process, only PTE.The use-cases below use few tricks to allocate a lot of PMD page tables
while keeping VmRSS and VmPTE low. oom_score for the process will be 0.#include
#include
#include
#include
#include
#include#define PUD_SIZE (1UL << 30)
#define PMD_SIZE (1UL << 21)#define NR_PUD 130000
int main(void)
{
char *addr = NULL;
unsigned long i;prctl(PR_SET_THP_DISABLE);
for (i = 0; i < NR_PUD ; i++) {
addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ,
MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap");
break;
}
*addr = 'x';
munmap(addr, PMD_SIZE);
mmap(addr, PMD_SIZE, PROT_WRITE|PROT_READ,
MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0);
if (addr == MAP_FAILED)
perror("re-mmap"), exit(1);
}
printf("PID %d consumed %lu KiB in PMD page tables\n",
getpid(), i * 4096 >> 10);
return pause();
}The patch addresses the issue by account PMD tables to the process the
same way we account PTE.The main place where PMD tables is accounted is __pmd_alloc() and
free_pmd_range(). But there're few corner cases:- HugeTLB can share PMD page tables. The patch handles by accounting
the table to all processes who share it.- x86 PAE pre-allocates few PMD tables on fork.
- Architectures with FIRST_USER_ADDRESS > 0. We need to adjust sanity
check on exit(2).Accounting only happens on configuration where PMD page table's level is
present (PMD is not folded). As with nr_ptes we use per-mm counter. The
counter value is used to calculate baseline for badness score by
oom-killer.Signed-off-by: Kirill A. Shutemov
Reported-by: Dave Hansen
Cc: Hugh Dickins
Reviewed-by: Cyrill Gorcunov
Cc: Pavel Emelyanov
Cc: David Rientjes
Tested-by: Sedat Dilek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If __unmap_hugepage_range() tries to unmap the address range over which
hugepage migration is on the way, we get the wrong page because pte_page()
doesn't work for migration entries. This patch simply clears the pte for
migration entries as we do for hwpoison entries.Fixes: 290408d4a2 ("hugetlb: hugepage migration core")
Signed-off-by: Naoya Horiguchi
Cc: Hugh Dickins
Cc: James Hogan
Cc: David Rientjes
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Rik van Riel
Cc: Andrea Arcangeli
Cc: Luiz Capitulino
Cc: Nishanth Aravamudan
Cc: Lee Schermerhorn
Cc: Steve Capper
Cc: [2.6.36+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There is a race condition between hugepage migration and
change_protection(), where hugetlb_change_protection() doesn't care about
migration entries and wrongly overwrites them. That causes unexpected
results like kernel crash. HWPoison entries also can cause the same
problem.This patch adds is_hugetlb_entry_(migration|hwpoisoned) check in this
function to do proper actions.Fixes: 290408d4a2 ("hugetlb: hugepage migration core")
Signed-off-by: Naoya Horiguchi
Cc: Hugh Dickins
Cc: James Hogan
Cc: David Rientjes
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Rik van Riel
Cc: Andrea Arcangeli
Cc: Luiz Capitulino
Cc: Nishanth Aravamudan
Cc: Lee Schermerhorn
Cc: Steve Capper
Cc: [2.6.36+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When running the test which causes the race as shown in the previous patch,
we can hit the BUG "get_page() on refcount 0 page" in hugetlb_fault().This race happens when pte turns into migration entry just after the first
check of is_hugetlb_entry_migration() in hugetlb_fault() passed with false.
To fix this, we need to check pte_present() again after huge_ptep_get().This patch also reorders taking ptl and doing pte_page(), because
pte_page() should be done in ptl. Due to this reordering, we need use
trylock_page() in page != pagecache_page case to respect locking order.Fixes: 66aebce747ea ("hugetlb: fix race condition in hugetlb_fault()")
Signed-off-by: Naoya Horiguchi
Cc: Hugh Dickins
Cc: James Hogan
Cc: David Rientjes
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Rik van Riel
Cc: Andrea Arcangeli
Cc: Luiz Capitulino
Cc: Nishanth Aravamudan
Cc: Lee Schermerhorn
Cc: Steve Capper
Cc: [3.2+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We have a race condition between move_pages() and freeing hugepages, where
move_pages() calls follow_page(FOLL_GET) for hugepages internally and
tries to get its refcount without preventing concurrent freeing. This
race crashes the kernel, so this patch fixes it by moving FOLL_GET code
for hugepages into follow_huge_pmd() with taking the page table lock.This patch intentionally removes page==NULL check after pte_page.
This is justified because pte_page() never returns NULL for any
architectures or configurations.This patch changes the behavior of follow_huge_pmd() for tail pages and
then tail pages can be pinned/returned. So the caller must be changed to
properly handle the returned tail pages.We could have a choice to add the similar locking to
follow_huge_(addr|pud) for consistency, but it's not necessary because
currently these functions don't support FOLL_GET flag, so let's leave it
for future development.Here is the reproducer:
$ cat movepages.c
#include
#include
#include#define ADDR_INPUT 0x700000000000UL
#define HPS 0x200000
#define PS 0x1000int main(int argc, char *argv[]) {
int i;
int nr_hp = strtol(argv[1], NULL, 0);
int nr_p = nr_hp * HPS / PS;
int ret;
void **addrs;
int *status;
int *nodes;
pid_t pid;pid = strtol(argv[2], NULL, 0);
addrs = malloc(sizeof(char *) * nr_p + 1);
status = malloc(sizeof(char *) * nr_p + 1);
nodes = malloc(sizeof(char *) * nr_p + 1);while (1) {
for (i = 0; i < nr_p; i++) {
addrs[i] = (void *)ADDR_INPUT + i * PS;
nodes[i] = 1;
status[i] = 0;
}
ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
MPOL_MF_MOVE_ALL);
if (ret == -1)
err("move_pages");for (i = 0; i < nr_p; i++) {
addrs[i] = (void *)ADDR_INPUT + i * PS;
nodes[i] = 0;
status[i] = 0;
}
ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
MPOL_MF_MOVE_ALL);
if (ret == -1)
err("move_pages");
}
return 0;
}$ cat hugepage.c
#include
#include
#include#define ADDR_INPUT 0x700000000000UL
#define HPS 0x200000int main(int argc, char *argv[]) {
int nr_hp = strtol(argv[1], NULL, 0);
char *p;while (1) {
p = mmap((void *)ADDR_INPUT, nr_hp * HPS, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
if (p != (void *)ADDR_INPUT) {
perror("mmap");
break;
}
memset(p, 0, nr_hp * HPS);
munmap(p, nr_hp * HPS);
}
}$ sysctl vm.nr_hugepages=40
$ ./hugepage 10 &
$ ./movepages 10 $(pgrep -f hugepage)Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()")
Signed-off-by: Naoya Horiguchi
Reported-by: Hugh Dickins
Cc: James Hogan
Cc: David Rientjes
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Rik van Riel
Cc: Andrea Arcangeli
Cc: Luiz Capitulino
Cc: Nishanth Aravamudan
Cc: Lee Schermerhorn
Cc: Steve Capper
Cc: [3.12+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Migrating hugepages and hwpoisoned hugepages are considered as non-present
hugepages, and they are referenced via migration entries and hwpoison
entries in their page table slots.This behavior causes race condition because pmd_huge() doesn't tell
non-huge pages from migrating/hwpoisoned hugepages. follow_page_mask() is
one example where the kernel would call follow_page_pte() for such
hugepage while this function is supposed to handle only normal pages.To avoid this, this patch makes pmd_huge() return true when pmd_none() is
true *and* pmd_present() is false. We don't have to worry about mixing up
non-present pmd entry with normal pmd (pointing to leaf level pte entry)
because pmd_present() is true in normal pmd.The same race condition could happen in (x86-specific) gup_pmd_range(),
where this patch simply adds pmd_present() check instead of pmd_huge().
This is because gup_pmd_range() is fast path. If we have non-present
hugepage in this function, we will go into gup_huge_pmd(), then return 0
at flag mask check, and finally fall back to the slow path.Fixes: 290408d4a2 ("hugetlb: hugepage migration core")
Signed-off-by: Naoya Horiguchi
Cc: Hugh Dickins
Cc: James Hogan
Cc: David Rientjes
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Rik van Riel
Cc: Andrea Arcangeli
Cc: Luiz Capitulino
Cc: Nishanth Aravamudan
Cc: Lee Schermerhorn
Cc: Steve Capper
Cc: [2.6.36+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently we have many duplicates in definitions around
follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this
patch tries to remove the m. The basic idea is to put the default
implementation for these functions in mm/hugetlb.c as weak symbols
(regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement
arch-specific code only when the arch needs it.For follow_huge_addr(), only powerpc and ia64 have their own
implementation, and in all other architectures this function just returns
ERR_PTR(-EINVAL). So this patch sets returning ERR_PTR(-EINVAL) as
default.As for follow_huge_(pmd|pud)(), if (pmd|pud)_huge() is implemented to
always return 0 in your architecture (like in ia64 or sparc,) it's never
called (the callsite is optimized away) no matter how implemented it is.
So in such architectures, we don't need arch-specific implementation.In some architecture (like mips, s390 and tile,) their current
arch-specific follow_huge_(pmd|pud)() are effectively identical with the
common code, so this patch lets these architecture use the common code.One exception is metag, where pmd_huge() could return non-zero but it
expects follow_huge_pmd() to always return NULL. This means that we need
arch-specific implementation which returns NULL. This behavior looks
strange to me (because non-zero pmd_huge() implies that the architecture
supports PMD-based hugepage, so follow_huge_pmd() can/should return some
relevant value,) but that's beyond this cleanup patch, so let's keep it.Justification of non-trivial changes:
- in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this
patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE
is true when follow_huge_pmd() can be called (note that pmd_huge() has
the same check and always returns 0 for !MACHINE_HAS_HPAGE.)
- in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common
code. This patch forces these archs use PMD_MASK, but it's OK because
they are identical in both archs.
In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20.
In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and
PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but
PTE_ORDER is always 0, so these are identical.Signed-off-by: Naoya Horiguchi
Acked-by: Hugh Dickins
Cc: James Hogan
Cc: David Rientjes
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Rik van Riel
Cc: Andrea Arcangeli
Cc: Luiz Capitulino
Cc: Nishanth Aravamudan
Cc: Lee Schermerhorn
Cc: Steve Capper
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Feb, 2015
1 commit
-
hugetlb_treat_as_movable declared as unsigned long, but
proc_dointvec() used for parsing it:static struct ctl_table vm_table[] = {
...
{
.procname = "hugepages_treat_as_movable",
.data = &hugepages_treat_as_movable,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
},This seems harmless, but it's better to use int type here.
Signed-off-by: Andrey Ryabinin
Cc: Dmitry Vyukov
Cc: Manfred Spraul
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Dec, 2014
1 commit
-
Pull drm updates from Dave Airlie:
"Highlights:- AMD KFD driver merge
This is the AMD HSA interface for exposing a lowlevel interface for
GPGPU use. They have an open source userspace built on top of this
interface, and the code looks as good as it was going to get out of
tree.- Initial atomic modesetting work
The need for an atomic modesetting interface to allow userspace to
try and send a complete set of modesetting state to the driver has
arisen, and been suffering from neglect this past year. No more,
the start of the common code and changes for msm driver to use it
are in this tree. Ongoing work to get the userspace ioctl finished
and the code clean will probably wait until next kernel.- DisplayID 1.3 and tiled monitor exposed to userspace.
Tiled monitor property is now exposed for userspace to make use of.
- Rockchip drm driver merged.
- imx gpu driver moved out of staging
Other stuff:
- core:
panel - MIPI DSI + new panels.
expose suggested x/y properties for virtual GPUs- i915:
Initial Skylake (SKL) support
gen3/4 reset work
start of dri1/ums removal
infoframe tracking
fixes for lots of things.- nouveau:
tegra k1 voltage support
GM204 modesetting support
GT21x memory reclocking work- radeon:
CI dpm fixes
GPUVM improvements
Initial DPM fan control- rcar-du:
HDMI support added
removed some support for old boards
slave encoder driver for Analog Devices adv7511- exynos:
Exynos4415 SoC support- msm:
a4xx gpu support
atomic helper conversion- tegra:
iommu support
universal plane support
ganged-mode DSI support- sti:
HDMI i2c improvements- vmwgfx:
some late fixes.- qxl:
use suggested x/y properties"* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (969 commits)
drm: sti: fix module compilation issue
drm/i915: save/restore GMBUS freq across suspend/resume on gen4
drm: sti: correctly cleanup CRTC and planes
drm: sti: add HQVDP plane
drm: sti: add cursor plane
drm: sti: enable auxiliary CRTC
drm: sti: fix delay in VTG programming
drm: sti: prepare sti_tvout to support auxiliary crtc
drm: sti: use drm_crtc_vblank_{on/off} instead of drm_vblank_{on/off}
drm: sti: fix hdmi avi infoframe
drm: sti: remove event lock while disabling vblank
drm: sti: simplify gdp code
drm: sti: clear all mixer control
drm: sti: remove gpio for HDMI hot plug detection
drm: sti: allow to change hdmi ddc i2c adapter
drm/doc: Document drm_add_modes_noedid() usage
drm/i915: Remove '& 0xffff' from the mask given to WA_REG()
drm/i915: Invert the mask and val arguments in wa_add() and WA_REG()
drm: Zero out DRM object memory upon cleanup
drm/i915/bdw: Fix the write setting up the WIZ hashing mode
...
14 Dec, 2014
4 commits
-
This function is only called during initialization.
Signed-off-by: Luiz Capitulino
Cc: Andi Kleen
Acked-by: David Rientjes
Cc: Rik van Riel
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Cc: Davidlohr Bueso
Acked-by: Naoya Horiguchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
No reason to duplicate the code of an existing macro.
Signed-off-by: Luiz Capitulino
Cc: Andi Kleen
Acked-by: David Rientjes
Cc: Rik van Riel
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Cc: Davidlohr Bueso
Acked-by: Naoya Horiguchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The i_mmap_mutex is a close cousin of the anon vma lock, both protecting
similar data, one for file backed pages and the other for anon memory. To
this end, this lock can also be a rwsem. In addition, there are some
important opportunities to share the lock when there are no tree
modifications.This conversion is straightforward. For now, all users take the write
lock.[sfr@canb.auug.org.au: update fremap.c]
Signed-off-by: Davidlohr Bueso
Reviewed-by: Rik van Riel
Acked-by: "Kirill A. Shutemov"
Acked-by: Hugh Dickins
Cc: Oleg Nesterov
Acked-by: Peter Zijlstra (Intel)
Cc: Srikar Dronamraju
Acked-by: Mel Gorman
Signed-off-by: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Convert all open coded mutex_lock/unlock calls to the
i_mmap_[lock/unlock]_write() helpers.Signed-off-by: Davidlohr Bueso
Acked-by: Rik van Riel
Acked-by: "Kirill A. Shutemov"
Acked-by: Hugh Dickins
Cc: Oleg Nesterov
Acked-by: Peter Zijlstra (Intel)
Cc: Srikar Dronamraju
Acked-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Dec, 2014
1 commit
-
Pull cgroup update from Tejun Heo:
"cpuset got simplified a bit. cgroup core got a fix on unified
hierarchy and grew some effective css related interfaces which will be
used for blkio support for writeback IO traffic which is currently
being worked on"* 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: implement cgroup_get_e_css()
cgroup: add cgroup_subsys->css_e_css_changed()
cgroup: add cgroup_subsys->css_released()
cgroup: fix the async css offline wait logic in cgroup_subtree_control_write()
cgroup: restructure child_subsys_mask handling in cgroup_subtree_control_write()
cgroup: separate out cgroup_calc_child_subsys_mask() from cgroup_refresh_child_subsys_mask()
cpuset: lock vs unlock typo
cpuset: simplify cpuset_node_allowed API
cpuset: convert callback_mutex to a spinlock
11 Dec, 2014
1 commit
-
First, after flushing TLB, we have no need to scan pte from start again.
Second, before bail out loop, the address is forwarded one step.Signed-off-by: Hillf Danton
Reviewed-by: Michal Hocko
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Oct, 2014
1 commit
-
Current cpuset API for checking if a zone/node is allowed to allocate
from looks rather awkward. We have hardwall and softwall versions of
cpuset_node_allowed with the softwall version doing literally the same
as the hardwall version if __GFP_HARDWALL is passed to it in gfp flags.
If it isn't, the softwall version may check the given node against the
enclosing hardwall cpuset, which it needs to take the callback lock to
do.Such a distinction was introduced by commit 02a0e53d8227 ("cpuset:
rework cpuset_zone_allowed api"). Before, we had the only version with
the __GFP_HARDWALL flag determining its behavior. The purpose of the
commit was to avoid sleep-in-atomic bugs when someone would mistakenly
call the function without the __GFP_HARDWALL flag for an atomic
allocation. The suffixes introduced were intended to make the callers
think before using the function.However, since the callback lock was converted from mutex to spinlock by
the previous patch, the softwall check function cannot sleep, and these
precautions are no longer necessary.So let's simplify the API back to the single check.
Suggested-by: David Rientjes
Signed-off-by: Vladimir Davydov
Acked-by: Christoph Lameter
Acked-by: Zefan Li
Signed-off-by: Tejun Heo
10 Oct, 2014
1 commit
-
Trivially convert a few VM_BUG_ON calls to VM_BUG_ON_VMA to extract
more information when they trigger.[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Sasha Levin
Reviewed-by: Naoya Horiguchi
Cc: Kirill A. Shutemov
Cc: Konstantin Khlebnikov
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Michal Hocko
Cc: Hugh Dickins
Cc: Vlastimil Babka
Cc: Michel Lespinasse
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Aug, 2014
5 commits
-
It is possible for some platforms, such as powerpc to set HPAGE_SHIFT to
0 to indicate huge pages not supported.When this is the case, hugetlbfs could be disabled during boot time:
hugetlbfs: disabling because there are no supported hugepage sizesThen in dissolve_free_huge_pages(), order is kept maximum (64 for
64bits), and the for loop below won't end: for (pfn = start_pfn; pfn <
end_pfn; pfn += 1 << order)As suggested by Naoya, below fix checks hugepages_supported() before
calling dissolve_free_huge_pages().[rientjes@google.com: no legitimate reason to call dissolve_free_huge_pages() when !hugepages_supported()]
Signed-off-by: Li Zhong
Acked-by: Naoya Horiguchi
Acked-by: David Rientjes
Signed-off-by: David Rientjes
Cc: [3.12+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
They are unnecessary: "zero" can be used in place of "hugetlb_zero" and
passing extra2 == NULL is equivalent to infinity.Signed-off-by: David Rientjes
Cc: Joonsoo Kim
Reviewed-by: Naoya Horiguchi
Reviewed-by: Luiz Capitulino
Cc: "Kirill A. Shutemov"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Three different interfaces alter the maximum number of hugepages for an
hstate:- /proc/sys/vm/nr_hugepages for global number of hugepages of the default
hstate,- /sys/kernel/mm/hugepages/hugepages-X/nr_hugepages for global number of
hugepages for a specific hstate, and- /sys/kernel/mm/hugepages/hugepages-X/nr_hugepages/mempolicy for number of
hugepages for a specific hstate over the set of allowed nodes.Generalize the code so that a single function handles all of these
writes instead of duplicating the code in two different functions.This decreases the number of lines of code, but also reduces the size of
.text by about half a percent since set_max_huge_pages() can be inlined.Signed-off-by: David Rientjes
Cc: Joonsoo Kim
Reviewed-by: Naoya Horiguchi
Reviewed-by: Luiz Capitulino
Cc: "Kirill A. Shutemov"
Acked-by: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When returning from hugetlb_cow(), we always (1) put back the refcount
for each referenced page -- always 'old', and 'new' if allocation was
successful. And (2) retake the page table lock right before returning,
as the callers expects. This logic can be simplified and encapsulated,
as proposed in this patch. In addition to cleaner code, we also shave a
few bytes off the instruction text:text data bss dec hex filename
28399 462 41328 70189 1122d mm/hugetlb.o-baseline
28367 462 41328 70157 1120d mm/hugetlb.o-patchedPasses libhugetlbfs testcases.
Signed-off-by: Davidlohr Bueso
Cc: Aswin Chandramouleeswaran
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This function always returns 1, thus no need to check return value in
hugetlb_cow(). By doing so, we can get rid of the unnecessary WARN_ON
call. While this logic perhaps existed as a way of identifying future
unmap_ref_private() mishandling, reality is it serves no apparent
purpose.Signed-off-by: Davidlohr Bueso
Cc: Aswin Chandramouleeswaran
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
31 Jul, 2014
1 commit
-
PG_head_mask was added into VMCOREINFO to filter huge pages in b3acc56bfe1
("kexec: save PG_head_mask in VMCOREINFO"), but makedumpfile still need
another symbol to filter *hugetlbfs* pages.If a user hope to filter user pages, makedumpfile tries to exclude them by
checking the condition whether the page is anonymous, but hugetlbfs pages
aren't anonymous while they also be user pages.We know it's possible to detect them in the same way as PageHuge(),
so we need the start address of free_huge_page():int PageHuge(struct page *page)
{
if (!PageCompound(page))
return 0;page = compound_head(page);
return get_compound_page_dtor(page) == free_huge_page;
}For that reason, this patch changes free_huge_page() into public
to export it to VMCOREINFO.Signed-off-by: Atsushi Kumagai
Acked-by: Baoquan He
Cc: Vivek Goyal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Jul, 2014
1 commit
-
Commit 4a705fef9862 ("hugetlb: fix copy_hugetlb_page_range() to handle
migration/hwpoisoned entry") changed the order of
huge_ptep_set_wrprotect() and huge_ptep_get(), which leads to breakage
in some workloads like hugepage-backed heap allocation via libhugetlbfs.
This patch fixes it.The test program for the problem is shown below:
$ cat heap.c
#include
#include
#include#define HPS 0x200000
int main() {
int i;
char *p = malloc(HPS);
memset(p, '1', HPS);
for (i = 0; i < 5; i++) {
if (!fork()) {
memset(p, '2', HPS);
p = malloc(HPS);
memset(p, '3', HPS);
free(p);
return 0;
}
}
sleep(1);
free(p);
return 0;
}$ export HUGETLB_MORECORE=yes ; export HUGETLB_NO_PREFAULT= ; hugectl --heap ./heap
Fixes 4a705fef9862 ("hugetlb: fix copy_hugetlb_page_range() to handle
migration/hwpoisoned entry"), so is applicable to -stable kernels which
include it.Signed-off-by: Naoya Horiguchi
Reported-by: Guillaume Morin
Suggested-by: Guillaume Morin
Acked-by: Hugh Dickins
Cc: [2.6.37+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Jun, 2014
1 commit
-
There's a race between fork() and hugepage migration, as a result we try
to "dereference" a swap entry as a normal pte, causing kernel panic.
The cause of the problem is that copy_hugetlb_page_range() can't handle
"swap entry" family (migration entry and hwpoisoned entry) so let's fix
it.[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Naoya Horiguchi
Acked-by: Hugh Dickins
Cc: Christoph Lameter
Cc: [2.6.37+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Jun, 2014
5 commits
-
We already have a function named hugepages_supported(), and the similar
name hugepage_migration_support() is a bit unconfortable, so let's rename
it hugepage_migration_supported().Signed-off-by: Naoya Horiguchi
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
alloc_huge_page() now mixes normal code path with error handle logic.
This patches move out the error handle logic, to make normal code path
more clean and redue code duplicate.Signed-off-by: Jianyu Zhan
Acked-by: Davidlohr Bueso
Reviewed-by: Michal Hocko
Reviewed-by: Aneesh Kumar K.V
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
HugeTLB is limited to allocating hugepages whose size are less than
MAX_ORDER order. This is so because HugeTLB allocates hugepages via the
buddy allocator. Gigantic pages (that is, pages whose size is greater
than MAX_ORDER order) have to be allocated at boottime.However, boottime allocation has at least two serious problems. First,
it doesn't support NUMA and second, gigantic pages allocated at boottime
can't be freed.This commit solves both issues by adding support for allocating gigantic
pages during runtime. It works just like regular sized hugepages,
meaning that the interface in sysfs is the same, it supports NUMA, and
gigantic pages can be freed.For example, on x86_64 gigantic pages are 1GB big. To allocate two 1G
gigantic pages on node 1, one can do:# echo 2 > \
/sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepagesAnd to free them all:
# echo 0 > \
/sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepagesThe one problem with gigantic page allocation at runtime is that it
can't be serviced by the buddy allocator. To overcome that problem,
this commit scans all zones from a node looking for a large enough
contiguous region. When one is found, it's allocated by using CMA, that
is, we call alloc_contig_range() to do the actual allocation. For
example, on x86_64 we scan all zones looking for a 1GB contiguous
region. When one is found, it's allocated by alloc_contig_range().One expected issue with that approach is that such gigantic contiguous
regions tend to vanish as runtime goes by. The best way to avoid this
for now is to make gigantic page allocations very early during system
boot, say from a init script. Other possible optimization include using
compaction, which is supported by CMA but is not explicitly used by this
commit.It's also important to note the following:
1. Gigantic pages allocated at boottime by the hugepages= command-line
option can be freed at runtime just fine2. This commit adds support for gigantic pages only to x86_64. The
reason is that I don't have access to nor experience with other archs.
The code is arch indepedent though, so it should be simple to add
support to different archs3. I didn't add support for hugepage overcommit, that is allocating
a gigantic page on demand when
/proc/sys/vm/nr_overcommit_hugepages > 0. The reason is that I don't
think it's reasonable to do the hard and long work required for
allocating a gigantic page at fault time. But it should be simple
to add this if wanted[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Luiz Capitulino
Reviewed-by: Davidlohr Bueso
Acked-by: Kirill A. Shutemov
Reviewed-by: Zhang Yanfei
Reviewed-by: Yasuaki Ishimatsu
Cc: Andrea Arcangeli
Cc: David Rientjes
Cc: Marcelo Tosatti
Cc: Naoya Horiguchi
Cc: Rik van Riel
Cc: Yinghai Lu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Next commit will add new code which will want to call
for_each_node_mask_to_alloc() macro. Move it, its buddy
for_each_node_mask_to_free() and their dependencies up in the file so the
new code can use them. This is just code movement, no logic change.Signed-off-by: Luiz Capitulino
Reviewed-by: Andrea Arcangeli
Reviewed-by: Naoya Horiguchi
Reviewed-by: Yasuaki Ishimatsu
Reviewed-by: Davidlohr Bueso
Acked-by: Kirill A. Shutemov
Reviewed-by: Zhang Yanfei
Cc: David Rientjes
Cc: Marcelo Tosatti
Cc: Rik van Riel
Cc: Yinghai Lu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Hugepages pages never get the PG_reserved bit set, so don't clear it.
However, note that if the bit gets mistakenly set free_pages_check() will
catch it.Signed-off-by: Luiz Capitulino
Reviewed-by: Davidlohr Bueso
Acked-by: Kirill A. Shutemov
Reviewed-by: Zhang Yanfei
Cc: Andrea Arcangeli
Cc: David Rientjes
Cc: Marcelo Tosatti
Cc: Naoya Horiguchi
Cc: Rik van Riel
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds