Eric Lee / smarc-fsl-linux-kernel

14 Dec, 2014

40 commits

7e5b528b4 mm/vmalloc.c: fix memory ordering bug ... Browse Code »

Read memory barriers must follow the read operations.

Signed-off-by: Dmitry Vyukov
Cc: Eric Dumazet
Acked-by: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dmitry Vyukov
2014-12-14 04:42:49 +0800
6a2d5679b oom: kill the insufficient and no longer needed PT_TRACE_EXIT check ... Browse Code »

After the previous patch we can remove the PT_TRACE_EXIT check in
oom_scan_process_thread(), it was added to handle the case when the
coredumping was "frozen" by ptrace, but it doesn't really work. If
nothing else, we would need to check all threads which could share the
same ->mm to make it more or less correct.

Signed-off-by: Oleg Nesterov
Cc: Cong Wang
Cc: David Rientjes
Acked-by: Michal Hocko
Cc: "Rafael J. Wysocki"
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2014-12-14 04:42:49 +0800
d003f371b oom: don't assume that a coredumping thread will exit soon ... Browse Code »

oom_kill.c assumes that PF_EXITING task should exit and free the memory
soon. This is wrong in many ways and one important case is the coredump.
A task can sleep in exit_mm() "forever" while the coredumping sub-thread
can need more memory.

Change the PF_EXITING checks to take SIGNAL_GROUP_COREDUMP into account,
we add the new trivial helper for that.

Note: this is only the first step, this patch doesn't try to solve other
problems. The SIGNAL_GROUP_COREDUMP check is obviously racy, a task can
participate in coredump after it was already observed in PF_EXITING state,
so TIF_MEMDIE (which also blocks oom-killer) still can be wrongly set.
fatal_signal_pending() can be true because of SIGNAL_GROUP_COREDUMP so
out_of_memory() and mem_cgroup_out_of_memory() shouldn't blindly trust it.
And even the name/usage of the new helper is confusing, an exiting thread
can only free its ->mm if it is the only/last task in thread group.

[akpm@linux-foundation.org: add comment]
Signed-off-by: Oleg Nesterov
Cc: Cong Wang
Acked-by: David Rientjes
Acked-by: Michal Hocko
Cc: "Rafael J. Wysocki"
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2014-12-14 04:42:49 +0800
ba914f481 mm: remove the highmem zones' memmap in the highmem zone ... Browse Code »

Since 01cefaef40c4 ("mm: provide more accurate estimation
of pages occupied by memmap") allocate the pages from lowmem for the
highmem zones' memmap. So It is not need to reserver the memmap's for
the highmem.

A 2G DDR3 for the arm platform:
On node 0 totalpages: 524288
free_area_init_node: node 0, pgdat 80ccd380, node_mem_map 80d38000
DMA zone: 3568 pages used for memmap
DMA zone: 0 pages reserved
DMA zone: 456704 pages, LIFO batch:31
HighMem zone: 528 pages used for memmap
HighMem zone: 67584 pages, LIFO batch:15

On node 0 totalpages: 524288
free_area_init_node: node 0, pgdat 80cd6f40, node_mem_map 80d42000
DMA zone: 3568 pages used for memmap
DMA zone: 0 pages reserved
DMA zone: 456704 pages, LIFO batch:31
HighMem zone: 67584 pages, LIFO batch:15

Signed-off-by: Hongbo Zhong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhong Hongbo
2014-12-14 04:42:49 +0800
2ebba6b7e mm: unmapped page migration avoid unmap+remap overhead ... Browse Code »

Page migration's __unmap_and_move(), and rmap's try_to_unmap(), were
created for use on pages almost certainly mapped into userspace. But
nowadays compaction often applies them to unmapped page cache pages: which
may exacerbate contention on i_mmap_rwsem quite unnecessarily, since
try_to_unmap_file() makes no preliminary page_mapped() check.

Now check page_mapped() in __unmap_and_move(); and avoid repeating the
same overhead in rmap_walk_file() - don't remove_migration_ptes() when we
never inserted any.

(The PageAnon(page) comment blocks now look even sillier than before, but
clean that up on some other occasion. And note in passing that
try_to_unmap_one() does not use a migration entry when PageSwapCache, so
remove_migration_ptes() will then not update that swap entry to newpage
pte: not a big deal, but something else to clean up later.)

Davidlohr remarked in "mm,fs: introduce helpers around the i_mmap_mutex"
conversion to i_mmap_rwsem, that "The biggest winner of these changes is
migration": a part of the reason might be all of that unnecessary taking
of i_mmap_mutex in page migration; and it's rather a shame that I didn't
get around to sending this patch in before his - this one is much less
useful after Davidlohr's conversion to rwsem, but still good.

Signed-off-by: Hugh Dickins
Cc: Davidlohr Bueso
Cc: Rik van Riel
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2014-12-14 04:42:49 +0800
5cec38ac8 fs, seq_file: fallback to vmalloc instead of oom kill processes ... Browse Code »

Since commit 058504edd026 ("fs/seq_file: fallback to vmalloc allocation"),
seq_buf_alloc() falls back to vmalloc() when the kmalloc() for contiguous
memory fails. This was done to address order-4 slab allocations for
reading /proc/stat on large machines and noticed because
PAGE_ALLOC_COSTLY_ORDER < 4, so there is no infinite loop in the page
allocator when allocating new slab for such high-order allocations.

Contiguous memory isn't necessary for caller of seq_buf_alloc(), however.
Other GFP_KERNEL high-order allocations that are
Cc: Heiko Carstens
Cc: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2014-12-14 04:42:49 +0800
6b4f7799c mm: vmscan: invoke slab shrinkers from shrink_zone() ... Browse Code »

The slab shrinkers are currently invoked from the zonelist walkers in
kswapd, direct reclaim, and zone reclaim, all of which roughly gauge the
eligible LRU pages and assemble a nodemask to pass to NUMA-aware
shrinkers, which then again have to walk over the nodemask. This is
redundant code, extra runtime work, and fairly inaccurate when it comes to
the estimation of actually scannable LRU pages. The code duplication will
only get worse when making the shrinkers cgroup-aware and requiring them
to have out-of-band cgroup hierarchy walks as well.

Instead, invoke the shrinkers from shrink_zone(), which is where all
reclaimers end up, to avoid this duplication.

Take the count for eligible LRU pages out of get_scan_count(), which
considers many more factors than just the availability of swap space, like
zone_reclaimable_pages() currently does. Accumulate the number over all
visited lruvecs to get the per-zone value.

Some nodes have multiple zones due to memory addressing restrictions. To
avoid putting too much pressure on the shrinkers, only invoke them once
for each such node, using the class zone of the allocation as the pivot
zone.

For now, this integrates the slab shrinking better into the reclaim logic
and gets rid of duplicative invocations from kswapd, direct reclaim, and
zone reclaim. It also prepares for cgroup-awareness, allowing
memcg-capable shrinkers to be added at the lruvec level without much
duplication of both code and runtime work.

This changes kswapd behavior, which used to invoke the shrinkers for each
zone, but with scan ratios gathered from the entire node, resulting in
meaningless pressure quantities on multi-zone nodes.

Zone reclaim behavior also changes. It used to shrink slabs until the
same amount of pages were shrunk as were reclaimed from the LRUs. Now it
merely invokes the shrinkers once with the zone's scan ratio, which makes
the shrinkers go easier on caches that implement aging and would prefer
feeding back pressure from recently used slab objects to unused LRU pages.

[vdavydov@parallels.com: assure class zone is populated]
Signed-off-by: Johannes Weiner
Cc: Dave Chinner
Signed-off-by: Vladimir Davydov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2014-12-14 04:42:48 +0800
f5f302e21 mm,vmacache: count number of system-wide flushes ... Browse Code »

These flushes deal with sequence number overflows, such as for long lived
threads. These are rare, but interesting from a debugging PoV. As such,
display the number of flushes when vmacache debugging is enabled.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2014-12-14 04:42:48 +0800
16a7ade8a Documentation: add new page_owner document ... Browse Code »

page owner is for the tracking about who allocated each page. This
document explains what is the page owner feature and what is the merit of
it. And, simple HOW-TO is also explained. See the document for detailed
information.

Signed-off-by: Joonsoo Kim
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: Minchan Kim
Cc: Dave Hansen
Cc: Michal Nazarewicz
Cc: Jungsoo Son
Cc: Ingo Molnar
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2014-12-14 04:42:48 +0800
61cf5febd mm/page_owner: correct owner information for early allocated pages ... Browse Code »

Extended memory to store page owner information is initialized some time
later than that page allocator starts. Until initialization, many pages
can be allocated and they have no owner information. This make debugging
using page owner harder, so some fixup will be helpful.

This patch fixes up this situation by setting fake owner information
immediately after page extension is initialized. Information doesn't tell
the right owner, but, at least, it can tell whether page is allocated or
not, more correctly.

On my testing, this patch catches 13343 early allocated pages, although
they are mostly allocated from page extension feature. Anyway, after
then, there is no page left that it is allocated and has no page owner
flag.

Signed-off-by: Joonsoo Kim
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: Minchan Kim
Cc: Dave Hansen
Cc: Michal Nazarewicz
Cc: Jungsoo Son
Cc: Ingo Molnar
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2014-12-14 04:42:48 +0800
48c96a368 mm/page_owner: keep track of page owners ... Browse Code »

This is the page owner tracking code which is introduced so far ago. It
is resident on Andrew's tree, though, nobody tried to upstream so it
remain as is. Our company uses this feature actively to debug memory leak
or to find a memory hogger so I decide to upstream this feature.

This functionality help us to know who allocates the page. When
allocating a page, we store some information about allocation in extra
memory. Later, if we need to know status of all pages, we can get and
analyze it from this stored information.

In previous version of this feature, extra memory is statically defined in
struct page, but, in this version, extra memory is allocated outside of
struct page. It enables us to turn on/off this feature at boottime
without considerable memory waste.

Although we already have tracepoint for tracing page allocation/free,
using it to analyze page owner is rather complex. We need to enlarge the
trace buffer for preventing overlapping until userspace program launched.
And, launched program continually dump out the trace buffer for later
analysis and it would change system behaviour with more possibility rather
than just keeping it in memory, so bad for debug.

Moreover, we can use page_owner feature further for various purposes. For
example, we can use it for fragmentation statistics implemented in this
patch. And, I also plan to implement some CMA failure debugging feature
using this interface.

I'd like to give the credit for all developers contributed this feature,
but, it's not easy because I don't know exact history. Sorry about that.
Below is people who has "Signed-off-by" in the patches in Andrew's tree.

Contributor:
Alexander Nyberg
Mel Gorman
Dave Hansen
Minchan Kim
Michal Nazarewicz
Andrew Morton
Jungsoo Son

Signed-off-by: Joonsoo Kim
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: Minchan Kim
Cc: Dave Hansen
Cc: Michal Nazarewicz
Cc: Jungsoo Son
Cc: Ingo Molnar
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2014-12-14 04:42:48 +0800
9a92a6ce6 stacktrace: introduce snprint_stack_trace for buffer output ... Browse Code »

Current stacktrace only have the function for console output. page_owner
that will be introduced in following patch needs to print the output of
stacktrace into the buffer for our own output format so so new function,
snprint_stack_trace(), is needed.

Signed-off-by: Joonsoo Kim
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: Minchan Kim
Cc: Dave Hansen
Cc: Michal Nazarewicz
Cc: Jungsoo Son
Cc: Ingo Molnar
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2014-12-14 04:42:48 +0800
dbc8358c7 mm/nommu: use alloc_pages_exact() rather than its own implementation ... Browse Code »

do_mmap_private() in nommu.c try to allocate physically contiguous pages
with arbitrary size in some cases and we now have good abstract function
to do exactly same thing, alloc_pages_exact(). So, change to use it.

There is no functional change. This is the preparation step for support
page owner feature accurately.

Signed-off-by: Joonsoo Kim
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: Minchan Kim
Cc: Dave Hansen
Cc: Michal Nazarewicz
Cc: Jungsoo Son
Cc: Ingo Molnar
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2014-12-14 04:42:48 +0800
031bc5743 mm/debug-pagealloc: make debug-pagealloc boottime configurable ... Browse Code »

Now, we have prepared to avoid using debug-pagealloc in boottime. So
introduce new kernel-parameter to disable debug-pagealloc in boottime, and
makes related functions to be disabled in this case.

Only non-intuitive part is change of guard page functions. Because guard
page is effective only if debug-pagealloc is enabled, turning off
according to debug-pagealloc is reasonable thing to do.

Signed-off-by: Joonsoo Kim
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: Minchan Kim
Cc: Dave Hansen
Cc: Michal Nazarewicz
Cc: Jungsoo Son
Cc: Ingo Molnar
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2014-12-14 04:42:48 +0800
e30825f18 mm/debug-pagealloc: prepare boottime configurable on/off ... Browse Code »

Until now, debug-pagealloc needs extra flags in struct page, so we need to
recompile whole source code when we decide to use it. This is really
painful, because it takes some time to recompile and sometimes rebuild is
not possible due to third party module depending on struct page. So, we
can't use this good feature in many cases.

Now, we have the page extension feature that allows us to insert extra
flags to outside of struct page. This gets rid of third party module
issue mentioned above. And, this allows us to determine if we need extra
memory for this page extension in boottime. With these property, we can
avoid using debug-pagealloc in boottime with low computational overhead in
the kernel built with CONFIG_DEBUG_PAGEALLOC. This will help our
development process greatly.

This patch is the preparation step to achive above goal. debug-pagealloc
originally uses extra field of struct page, but, after this patch, it will
use field of struct page_ext. Because memory for page_ext is allocated
later than initialization of page allocator in CONFIG_SPARSEMEM, we should
disable debug-pagealloc feature temporarily until initialization of
page_ext. This patch implements this.

Signed-off-by: Joonsoo Kim
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: Minchan Kim
Cc: Dave Hansen
Cc: Michal Nazarewicz
Cc: Jungsoo Son
Cc: Ingo Molnar
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2014-12-14 04:42:48 +0800
eefa864b7 mm/page_ext: resurrect struct page extending code for debugging ... Browse Code »

When we debug something, we'd like to insert some information to every
page. For this purpose, we sometimes modify struct page itself. But,
this has drawbacks. First, it requires re-compile. This makes us
hesitate to use the powerful debug feature so development process is
slowed down. And, second, sometimes it is impossible to rebuild the
kernel due to third party module dependency. At third, system behaviour
would be largely different after re-compile, because it changes size of
struct page greatly and this structure is accessed by every part of
kernel. Keeping this as it is would be better to reproduce errornous
situation.

This feature is intended to overcome above mentioned problems. This
feature allocates memory for extended data per page in certain place
rather than the struct page itself. This memory can be accessed by the
accessor functions provided by this code. During the boot process, it
checks whether allocation of huge chunk of memory is needed or not. If
not, it avoids allocating memory at all. With this advantage, we can
include this feature into the kernel in default and can avoid rebuild and
solve related problems.

Until now, memcg uses this technique. But, now, memcg decides to embed
their variable to struct page itself and it's code to extend struct page
has been removed. I'd like to use this code to develop debug feature, so
this patch resurrect it.

To help these things to work well, this patch introduces two callbacks for
clients. One is the need callback which is mandatory if user wants to
avoid useless memory allocation at boot-time. The other is optional, init
callback, which is used to do proper initialization after memory is
allocated. Detailed explanation about purpose of these functions is in
code comment. Please refer it.

Others are completely same with previous extension code in memcg.

Signed-off-by: Joonsoo Kim
Cc: Mel Gorman
Cc: Johannes Weiner
Cc: Minchan Kim
Cc: Dave Hansen
Cc: Michal Nazarewicz
Cc: Jungsoo Son
Cc: Ingo Molnar
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2014-12-14 04:42:48 +0800
2d48366b3 mm, gfp: escalatedly define GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE ... Browse Code »

GFP_USER, GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE are escalatedly confined
defined, also implied by their names:

GFP_USER = GFP_USER
GFP_USER + __GFP_HIGHMEM = GFP_HIGHUSER
GFP_USER + __GFP_HIGHMEM + __GFP_MOVABLE = GFP_HIGHUSER_MOVABLE

So just make GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE escalatedly defined to
reflect this fact. It also makes the definition clear and texturally warn
on any furture break-up of this escalated relastionship.

Signed-off-by: Jianyu Zhan
Acked-by: Johannes Weiner
Acked-by: Kirill A. Shutemov
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jianyu Zhan
2014-12-14 04:42:48 +0800
66f2ca7e3 include/linux/kmemleak.h: needs slab.h ... Browse Code »

include/linux/kmemleak.h: In function 'kmemleak_alloc_recursive':
include/linux/kmemleak.h:43: error: 'SLAB_NOLEAKTRACE' undeclared (first use in this function)

Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2014-12-14 04:42:47 +0800
056b7ccef mm/memcontrol.c: remove the unused arg in __memcg_kmem_get_cache() ... Browse Code »

The gfp was passed in but never used in this function.

Signed-off-by: Zhang Zhen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhang Zhen
2014-12-14 04:42:47 +0800
bd6dace78 mm: move swp_entry_t definition to include/linux/mm_types.h ... Browse Code »

swp_entry_t being defined in include/linux/swap.h instead of
include/linux/mm_types.h causes cyclic include dependency later when
include/linux/page_cgroup.h is included from writeback path. Move the
definition to include/linux/mm_types.h.

While at it, reformat the comment above it.

Signed-off-by: Tejun Heo
Cc: Mel Gorman
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2014-12-14 04:42:47 +0800
71fbd556a memory-hotplug: remove redundant call of page_to_pfn ... Browse Code »

This is just a small optimization. The start_pfn can be obtained directly
by phys_index << PFN_SECTION_SHIFT. So the call of page_to_pfn() is
redundant and remove it.

Signed-off-by: Zhang Zhen
Acked-by: Yasuaki Ishimatsu
Acked-by: David Rientjes
Cc: Dave Hansen
Cc: Wang Nan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhang Zhen
2014-12-14 04:42:47 +0800
9dc00f4c4 iommu/amd: use handle_mm_fault directly ... Browse Code »

This could be useful for debug in the future if we want to track
major/minor faults more closely, and also avoids the put_page trick we
used with gup.

In order to do this, we also track the task struct in the PASID state
structure. This lets us update the appropriate task stats after the fault
has been handled, and may aid with debug in the future as well.

Signed-off-by: Jesse Barnes
Tested-by: Oded Gabbay
Cc: Joerg Roedel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesse Barnes
2014-12-14 04:42:47 +0800
e1d6d01ab mm: export find_extend_vma() and handle_mm_fault() for driver use ... Browse Code »

This lets drivers like the AMD IOMMUv2 driver handle faults a bit more
simply, rather than doing tricks with page refs and get_user_pages().

Signed-off-by: Jesse Barnes
Cc: Oded Gabbay
Cc: Joerg Roedel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jesse Barnes
2014-12-14 04:42:47 +0800
7d9ca0004 hugetlb: hugetlb_register_all_nodes(): add __init marker ... Browse Code »

This function is only called during initialization.

Signed-off-by: Luiz Capitulino
Cc: Andi Kleen
Acked-by: David Rientjes
Cc: Rik van Riel
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Cc: Davidlohr Bueso
Acked-by: Naoya Horiguchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luiz Capitulino
2014-12-14 04:42:47 +0800
df994ead5 hugetlb: alloc_bootmem_huge_page(): use IS_ALIGNED() ... Browse Code »

No reason to duplicate the code of an existing macro.

Signed-off-by: Luiz Capitulino
Cc: Andi Kleen
Acked-by: David Rientjes
Cc: Rik van Riel
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Cc: Davidlohr Bueso
Acked-by: Naoya Horiguchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luiz Capitulino
2014-12-14 04:42:47 +0800
27ec26ecd hugetlb: fix hugepages= entry in kernel-parameters.txt ... Browse Code »

The hugepages= entry in kernel-parameters.txt states that 1GB pages can
only be allocated at boot time and not freed afterwards. This is not
true since commit 944d9fec8d7a ("hugetlb: add support for gigantic page
allocation at runtime"), at least for x86_64.

Instead of adding arch-specifc observations to the hugepages= entry,
this commit just drops the out of date information. Further information
about arch-specific support and available features can be obtained in
the hugetlb documentation.

Signed-off-by: Luiz Capitulino
Cc: Andi Kleen
Acked-by: David Rientjes
Cc: Rik van Riel
Cc: Yasuaki Ishimatsu
Cc: Yinghai Lu
Cc: Davidlohr Bueso
Acked-by: Naoya Horiguchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Luiz Capitulino
2014-12-14 04:42:47 +0800
6f185c290 memcg: turn memcg_kmem_skip_account into a bit field ... Browse Code »

It isn't supposed to stack, so turn it into a bit-field to save 4 bytes on
the task_struct.

Also, remove the memcg_stop/resume_kmem_account helpers - it is clearer to
set/clear the flag inline. Regarding the overwhelming comment to the
helpers, which is removed by this patch too, we already have a compact yet
accurate explanation in memcg_schedule_cache_create, no need in yet
another one.

Signed-off-by: Vladimir Davydov
Cc: Johannes Weiner
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-12-14 04:42:47 +0800
4e701d7b3 memcg: only check memcg_kmem_skip_account in __memcg_kmem_get_cache ... Browse Code »

__memcg_kmem_get_cache can recurse if it calls kmalloc (which it does if
the cgroup's kmem cache doesn't exist), because kmalloc may call
__memcg_kmem_get_cache internally again. To avoid the recursion, we use
the task_struct->memcg_kmem_skip_account flag.

However, there's no need checking the flag in memcg_kmem_newpage_charge,
because there's no way how this function could result in recursion, if
called from memcg_kmem_get_cache. So let's remove the redundant code.

Signed-off-by: Vladimir Davydov
Cc: Johannes Weiner
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-12-14 04:42:47 +0800
900a38f02 memcg: zap kmem_account_flags ... Browse Code »

The only such flag is KMEM_ACCOUNTED_ACTIVE, but it's set iff
mem_cgroup->kmemcg_id is initialized, so we can check kmemcg_id instead of
having a separate flags field.

Signed-off-by: Vladimir Davydov
Cc: Johannes Weiner
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-12-14 04:42:46 +0800
c313dc5de mm: mincore: add hwpoison page handle ... Browse Code »

When the encountered pte is a swap entry, the current code handles two
cases: migration and normal swapentry, but we have a third case: hwpoison
page.

This patch adds hwpoison page handle, consider hwpoison page incore as
same as migration.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Weijie Yang
Acked-by: Johannes Weiner
Cc: Mel Gorman
Cc: Hugh Dickins
Cc: Rik van Riel
Acked-by: Naoya Horiguchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Weijie Yang
2014-12-14 04:42:46 +0800
b258d8606 mm/rmap: calculate page offset when needed ... Browse Code »

Call page_to_pgoff() to get the page offset once we are sure we actually
need it, and any very obvious initial function checks have passed.
Trivial micro-optimization, and potentially save some cycles.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2014-12-14 04:42:46 +0800
2847cf95c mm/debug-pagealloc: cleanup page guard code ... Browse Code »

Page guard is used by debug-pagealloc feature. Currently, it is
open-coded, but, I think that more abstraction of it makes core page
allocator code more readable.

There is no functional difference.

Signed-off-by: Joonsoo Kim
Acked-by: Vlastimil Babka
Cc: Gioh Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joonsoo Kim
2014-12-14 04:42:46 +0800
4308ce17f mm/memblock.c: refactor functions to set/clear MEMBLOCK_HOTPLUG ... Browse Code »

There is a lot of duplication in the rubric around actually setting or
clearing a mem region flag. Create a new helper function to do this and
reduce each of memblock_mark_hotplug() and memblock_clear_hotplug() to a
single line.

This will be useful if someone were to add a new mem region flag - which
I hope to be doing some day soon. But it looks like a plausible cleanup
even without that - so I'd like to get it out of the way now.

Signed-off-by: Tony Luck
Cc: Santosh Shilimkar
Cc: Tang Chen
Cc: Grygorii Strashko
Cc: Zhang Yanfei
Cc: Philipp Hachtmann
Cc: Yinghai Lu
Cc: Emil Medve
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tony Luck
2014-12-14 04:42:46 +0800
95fc3c501 memcg: do not abuse memcg_kmem_skip_account ... Browse Code »

task_struct->memcg_kmem_skip_account was initially introduced to avoid
recursion during kmem cache creation: memcg_kmem_get_cache, which is
called by kmem_cache_alloc to determine the per-memcg cache to account
allocation to, may issue lazy cache creation if the needed cache doesn't
exist, which means issuing yet another kmem_cache_alloc. We can't just
pass a flag to the nested kmem_cache_alloc disabling kmem accounting,
because there are hidden allocations, e.g. in INIT_WORK. So we
introduced a flag on the task_struct, memcg_kmem_skip_account, making
memcg_kmem_get_cache return immediately.

By its nature, the flag may also be used to disable accounting for
allocations shared among different cgroups, and currently it is used this
way in memcg_activate_kmem. Using it like this looks like abusing it to
me. If we want to disable accounting for some allocations (which we will
definitely want one day), we should either add GFP_NO_MEMCG or GFP_MEMCG
flag in order to blacklist/whitelist some allocations.

For now, let's simply remove memcg_stop/resume_kmem_account from
memcg_activate_kmem.

Signed-off-by: Vladimir Davydov
Cc: Johannes Weiner
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-12-14 04:42:46 +0800
9d100c5e4 memcg: don't check mm in __memcg_kmem_{get_cache,newpage_charge} ... Browse Code »

We already assured the current task has mm in memcg_kmem_should_charge,
no need to double check.

Signed-off-by: Vladimir Davydov
Cc: Johannes Weiner
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-12-14 04:42:46 +0800
bfda7e8fe memcg: __mem_cgroup_free: remove stale disarm_static_keys comment ... Browse Code »

cpuset code stopped using cgroup_lock in favor of cpuset_mutex long ago.

Signed-off-by: Vladimir Davydov
Cc: Johannes Weiner
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-12-14 04:42:46 +0800
b5be83e30 mm: cma: align to physical address, not CMA region position ... Browse Code »

The alignment in cma_alloc() was done w.r.t. the bitmap. This is a
problem when, for example:

- a device requires 16M (order 12) alignment
- the CMA region is not 16 M aligned

In such a case, can result with the CMA region starting at, say,
0x2f800000 but any allocation you make from there will be aligned from
there. Requesting an allocation of 32 M with 16 M alignment will result
in an allocation from 0x2f800000 to 0x31800000, which doesn't work very
well if your strange device requires 16M alignment.

Change to use bitmap_find_next_zero_area_off() to account for the
difference in alignment at reserve-time and alloc-time.

Signed-off-by: Gregory Fong
Acked-by: Michal Nazarewicz
Cc: Marek Szyprowski
Cc: Joonsoo Kim
Cc: Kukjin Kim
Cc: Laurent Pinchart
Cc: Laura Abbott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gregory Fong
2014-12-14 04:42:46 +0800
5e19b013f lib: bitmap: add alignment offset for bitmap_find_next_zero_area() ... Browse Code »

Add a bitmap_find_next_zero_area_off() function which works like
bitmap_find_next_zero_area() function except it allows an offset to be
specified when alignment is checked. This lets caller request a bit such
that its number plus the offset is aligned according to the mask.

[gregory.0xf0@gmail.com: Retrieved from https://patchwork.linuxtv.org/patch/6254/ and updated documentation]
Signed-off-by: Michal Nazarewicz
Signed-off-by: Kyungmin Park
Signed-off-by: Marek Szyprowski
Signed-off-by: Gregory Fong
Cc: Joonsoo Kim
Cc: Kukjin Kim
Cc: Laurent Pinchart
Cc: Laura Abbott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Nazarewicz
2014-12-14 04:42:46 +0800
c8475d144 mm/memory.c: share the i_mmap_rwsem ... Browse Code »

The unmap_mapping_range family of functions do the unmapping of user pages
(ultimately via zap_page_range_single) without touching the actual
interval tree, thus share the lock.

Signed-off-by: Davidlohr Bueso
Cc: "Kirill A. Shutemov"
Acked-by: Hugh Dickins
Cc: Oleg Nesterov
Cc: Peter Zijlstra (Intel)
Cc: Rik van Riel
Cc: Srikar Dronamraju
Acked-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2014-12-14 04:42:46 +0800
1acf2e040 mm/nommu: share the i_mmap_rwsem ... Browse Code »

Shrinking/truncate logic can call nommu_shrink_inode_mappings() to verify
that any shared mappings of the inode in question aren't broken (dead
zone). afaict the only user being ramfs to handle the size change
attribute.

Pretty much a no-brainer to share the lock.

Signed-off-by: Davidlohr Bueso
Acked-by: "Kirill A. Shutemov"
Acked-by: Hugh Dickins
Cc: Oleg Nesterov
Acked-by: Peter Zijlstra (Intel)
Cc: Rik van Riel
Cc: Srikar Dronamraju
Acked-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2014-12-14 04:42:46 +0800