Eric Lee / smarc-fsl-linux-kernel

27 Nov, 2012

2 commits

50694c28f mm: vmscan: check for fatal signals iff the process was throttled ... Browse Code »

Commit 5515061d22f0 ("mm: throttle direct reclaimers if PF_MEMALLOC
reserves are low and swap is backed by network storage") introduced a
check for fatal signals after a process gets throttled for network
storage. The intention was that if a process was throttled and got
killed that it should not trigger the OOM killer. As pointed out by
Minchan Kim and David Rientjes, this check is in the wrong place and too
broad. If a system is in am OOM situation and a process is exiting, it
can loop in __alloc_pages_slowpath() and calling direct reclaim in a
loop. As the fatal signal is pending it returns 1 as if it is making
forward progress and can effectively deadlock.

This patch moves the fatal_signal_pending() check after throttling to
throttle_direct_reclaim() where it belongs. If the process is killed
while throttled, it will return immediately without direct reclaim
except now it will have TIF_MEMDIE set and will use the PFMEMALLOC
reserves.

Minchan pointed out that it may be better to direct reclaim before
returning to avoid using the reserves because there may be pages that
can easily reclaim that would avoid using the reserves. However, we do
no such targetted reclaim and there is no guarantee that suitable pages
are available. As it is expected that this throttling happens when
swap-over-NFS is used there is a possibility that the process will
instead swap which may allocate network buffers from the PFMEMALLOC
reserves. Hence, in the swap-over-nfs case where a process can be
throtted and be killed it can use the reserves to exit or it can
potentially use reserves to swap a few pages and then exit. This patch
takes the option of using the reserves if necessary to allow the process
exit quickly.

If this patch passes review it should be considered a -stable candidate
for 3.6.

Signed-off-by: Mel Gorman
Cc: David Rientjes
Cc: Luigi Semenzato
Cc: Dan Magenheimer
Cc: KOSAKI Motohiro
Cc: Sonny Rao
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-11-27 09:41:24 +0800
82b212f40 Revert "mm: remove __GFP_NO_KSWAPD" ... Browse Code »

With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction
based on failures" reverted, Zdenek Kabelac reported the following

Hmm, so it's just took longer to hit the problem and observe
kswapd0 spinning on my CPU again - it's not as endless like before -
but still it easily eats minutes - it helps to turn off Firefox
or TB (memory hungry apps) so kswapd0 stops soon - and restart
those apps again. (And I still have like >1GB of cached memory)

kswapd0 R running task 0 30 2 0x00000000
Call Trace:
preempt_schedule+0x42/0x60
_raw_spin_unlock+0x55/0x60
put_super+0x31/0x40
drop_super+0x22/0x30
prune_super+0x149/0x1b0
shrink_slab+0xba/0x510

The sysrq+m indicates the system has no swap so it'll never reclaim
anonymous pages as part of reclaim/compaction. That is one part of the
problem but not the root cause as file-backed pages could also be
reclaimed.

The likely underlying problem is that kswapd is woken up or kept awake
for each THP allocation request in the page allocator slow path.

If compaction fails for the requesting process then compaction will be
deferred for a time and direct reclaim is avoided. However, if there
are a storm of THP requests that are simply rejected, it will still be
the the case that kswapd is awake for a prolonged period of time as
pgdat->kswapd_max_order is updated each time. This is noticed by the
main kswapd() loop and it will not call kswapd_try_to_sleep(). Instead
it will loopp, shrinking a small number of pages and calling
shrink_slab() on each iteration.

The temptation is to supply a patch that checks if kswapd was woken for
THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not
backed up by proper testing. As 3.7 is very close to release and this
is not a bug we should release with, a safer path is to revert "mm:
remove __GFP_NO_KSWAPD" for now and revisit it with the view to ironing
out the balance_pgdat() logic in general.

Signed-off-by: Mel Gorman
Cc: Zdenek Kabelac
Cc: Seth Jennings
Cc: Valdis Kletnieks
Cc: Jiri Slaby
Cc: Rik van Riel
Cc: Robert Jennings
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-11-27 09:41:24 +0800

22 Nov, 2012

1 commit

ef6c5be65 fix incorrect NR_FREE_PAGES accounting (appears like memory leak) ... Browse Code »

There have been some 3.7-rc reports of vm issues, including some kswapd
bugs and, more importantly, some memory "leaks":

http://www.spinics.net/lists/linux-mm/msg46187.html
https://bugzilla.kernel.org/show_bug.cgi?id=50181

Commit 1fb3f8ca0e92 ("mm: compaction: capture a suitable high-order page
immediately when it is made available") took split_free_page() and
reused it for the compaction code. It does something curious with
capture_free_page() (previously known as split_free_page()):

int capture_free_page(struct page *page, int alloc_order,
...
__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));

- /* Split into individual pages */
- set_page_refcounted(page);
- split_page(page, order);
+ if (alloc_order != order)
+ expand(zone, page, alloc_order, order,
+ &zone->free_area[order], migratetype);

Note that expand() puts the pages _back_ in the allocator, but it does
not bump NR_FREE_PAGES. We "return" 'alloc_order' worth of pages, but
we accounted for removing 'order' in the __mod_zone_page_state() call.

For the old split_page()-style use (order==alloc_order) the bug will not
trigger. But, when called from the compaction code where we
occasionally get a larger page out of the buddy allocator than we need,
we will run in to this.

This patch simply changes the NR_FREE_PAGES manipulation to the correct
'alloc_order' instead of 'order'.

I've been able to repeatedly trigger this in my testing environment.
The amount "leaked" very closely tracks the imbalance I see in buddy
pages vs. NR_FREE_PAGES. I have confirmed that this patch fixes the
imbalance

Signed-off-by: Dave Hansen
Acked-by: Mel Gorman
Signed-off-by: Linus Torvalds

Dave Hansen
2012-11-22 06:33:16 +0800

17 Nov, 2012

10 commits

5576646f3 revert "mm: fix-up zone present pages" ... Browse Code »

Revert commit 7f1290f2f2a4 ("mm: fix-up zone present pages")

That patch tried to fix a issue when calculating zone->present_pages,
but it caused a regression on 32bit systems with HIGHMEM. With that
change, reset_zone_present_pages() resets all zone->present_pages to
zero, and fixup_zone_present_pages() is called to recalculate
zone->present_pages when the boot allocator frees core memory pages into
buddy allocator. Because highmem pages are not freed by bootmem
allocator, all highmem zones' present_pages becomes zero.

Various options for improving the situation are being discussed but for
now, let's return to the 3.6 code.

Cc: Jianguo Wu
Cc: Jiang Liu
Cc: Petr Tesarik
Cc: "Luck, Tony"
Cc: Mel Gorman
Cc: Yinghai Lu
Cc: Minchan Kim
Cc: Johannes Weiner
Acked-by: David Rientjes
Tested-by: Chris Clayton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2012-11-17 06:33:04 +0800
0f3c42f52 tmpfs: change final i_blocks BUG to WARNING ... Browse Code »

Under a particular load on one machine, I have hit shmem_evict_inode()'s
BUG_ON(inode->i_blocks), enough times to narrow it down to a particular
race between swapout and eviction.

It comes from the "if (freed > 0)" asymmetry in shmem_recalc_inode(),
and the lack of coherent locking between mapping's nrpages and shmem's
swapped count. There's a window in shmem_writepage(), between lowering
nrpages in shmem_delete_from_page_cache() and then raising swapped
count, when the freed count appears to be +1 when it should be 0, and
then the asymmetry stops it from being corrected with -1 before hitting
the BUG.

One answer is coherent locking: using tree_lock throughout, without
info->lock; reasonable, but the raw_spin_lock in percpu_counter_add() on
used_blocks makes that messier than expected. Another answer may be a
further effort to eliminate the weird shmem_recalc_inode() altogether,
but previous attempts at that failed.

So far undecided, but for now change the BUG_ON to WARN_ON: in usual
circumstances it remains a useful consistency check.

Signed-off-by: Hugh Dickins
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-11-17 06:33:04 +0800
215c02bc3 tmpfs: fix shmem_getpage_gfp() VM_BUG_ON ... Browse Code »

Fuzzing with trinity hit the "impossible" VM_BUG_ON(error) (which Fedora
has converted to WARNING) in shmem_getpage_gfp():

WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
Call Trace:
warn_slowpath_common+0x7f/0xc0
warn_slowpath_null+0x1a/0x20
shmem_getpage_gfp+0xa5c/0xa70
shmem_fault+0x4f/0xa0
__do_fault+0x71/0x5c0
handle_pte_fault+0x97/0xae0
handle_mm_fault+0x289/0x350
__do_page_fault+0x18e/0x530
do_page_fault+0x2b/0x50
page_fault+0x28/0x30
tracesys+0xe1/0xe6

Thanks to Johannes for pointing to truncation: free_swap_and_cache()
only does a trylock on the page, so the page lock we've held since
before confirming swap is not enough to protect against truncation.

What cleanup is needed in this case? Just delete_from_swap_cache(),
which takes care of the memcg uncharge.

Signed-off-by: Hugh Dickins
Reported-by: Dave Jones
Cc: Johannes Weiner
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-11-17 06:33:04 +0800
498c22802 mm: highmem: don't treat PKMAP_ADDR(LAST_PKMAP) as a highmem address ... Browse Code »

kmap_to_page returns the corresponding struct page for a virtual address
of an arbitrary mapping. This works by checking whether the address
falls in the pkmap region and using the pkmap page tables instead of the
linear mapping if appropriate.

Unfortunately, the bounds checking means that PKMAP_ADDR(LAST_PKMAP) is
incorrectly treated as a highmem address and we can end up walking off
the end of pkmap_page_table and subsequently passing junk to pte_page.

This patch fixes the bound check to stay within the pkmap tables.

Signed-off-by: Will Deacon
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Will Deacon
2012-11-17 06:33:04 +0800
96710098e mm: revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" ... Browse Code »

Jiri Slaby reported the following:

(It's an effective revert of "mm: vmscan: scale number of pages
reclaimed by reclaim/compaction based on failures".) Given kswapd
had hours of runtime in ps/top output yesterday in the morning
and after the revert it's now 2 minutes in sum for the last 24h,
I would say, it's gone.

The intention of the patch in question was to compensate for the loss of
lumpy reclaim. Part of the reason lumpy reclaim worked is because it
aggressively reclaimed pages and this patch was meant to be a sane
compromise.

When compaction fails, it gets deferred and both compaction and
reclaim/compaction is deferred avoid excessive reclaim. However, since
commit c654345924f7 ("mm: remove __GFP_NO_KSWAPD"), kswapd is woken up
each time and continues reclaiming which was not taken into account when
the patch was developed.

Attempts to address the problem ended up just changing the shape of the
problem instead of fixing it. The release window gets closer and while
a THP allocation failing is not a major problem, kswapd chewing up a lot
of CPU is.

This patch reverts commit 83fde0f22872 ("mm: vmscan: scale number of
pages reclaimed by reclaim/compaction based on failures") and will be
revisited in the future.

Signed-off-by: Mel Gorman
Cc: Zdenek Kabelac
Tested-by: Valdis Kletnieks
Cc: Jiri Slaby
Cc: Rik van Riel
Cc: Jiri Slaby
Cc: Johannes Hirte
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-11-17 06:33:04 +0800
f58b59c1d swapfile: fix name leak in swapoff ... Browse Code »

There's a name leak introduced by commit 91a27b2a7567 ("vfs: define
struct filename and have getname() return it"). Add the missing
putname.

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Xiaotian Feng
Reviewed-by: Jeff Layton
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xiaotian Feng
2012-11-17 06:33:04 +0800
bea8c150a memcg: fix hotplugged memory zone oops ... Browse Code »

When MEMCG is configured on (even when it's disabled by boot option),
when adding or removing a page to/from its lru list, the zone pointer
used for stats updates is nowadays taken from the struct lruvec. (On
many configurations, calculating zone from page is slower.)

But we have no code to update all the lruvecs (per zone, per memcg) when
a memory node is hotadded. Here's an extract from the oops which
results when running numactl to bind a program to a newly onlined node:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000f60
IP: __mod_zone_page_state+0x9/0x60
Pid: 1219, comm: numactl Not tainted 3.6.0-rc5+ #180 Bochs Bochs
Process numactl (pid: 1219, threadinfo ffff880039abc000, task ffff8800383c4ce0)
Call Trace:
__pagevec_lru_add_fn+0xdf/0x140
pagevec_lru_move_fn+0xb1/0x100
__pagevec_lru_add+0x1c/0x30
lru_add_drain_cpu+0xa3/0x130
lru_add_drain+0x2f/0x40
...

The natural solution might be to use a memcg callback whenever memory is
hotadded; but that solution has not been scoped out, and it happens that
we do have an easy location at which to update lruvec->zone. The lruvec
pointer is discovered either by mem_cgroup_zone_lruvec() or by
mem_cgroup_page_lruvec(), and both of those do know the right zone.

So check and set lruvec->zone in those; and remove the inadequate
attempt to set lruvec->zone from lruvec_init(), which is called before
NODE_DATA(node) has been allocated in such cases.

Ah, there was one exceptionr. For no particularly good reason,
mem_cgroup_force_empty_list() has its own code for deciding lruvec.
Change it to use the standard mem_cgroup_zone_lruvec() and
mem_cgroup_get_lru_size() too. In fact it was already safe against such
an oops (the lru lists in danger could only be empty), but we're better
proofed against future changes this way.

I've marked this for stable (3.6) since we introduced the problem in 3.5
(now closed to stable); but I have no idea if this is the only fix
needed to get memory hotadd working with memcg in 3.6, and received no
answer when I enquired twice before.

Reported-by: Tang Chen
Signed-off-by: Hugh Dickins
Acked-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Konstantin Khlebnikov
Cc: Wen Congyang
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-11-17 06:33:04 +0800
9a5a8f19b memcg: oom: fix totalpages calculation for memory.swappiness==0 ... Browse Code »

oom_badness() takes a totalpages argument which says how many pages are
available and it uses it as a base for the score calculation. The value
is calculated by mem_cgroup_get_limit which considers both limit and
total_swap_pages (resp. memsw portion of it).

This is usually correct but since fe35004fbf9e ("mm: avoid swapping out
with swappiness==0") we do not swap when swappiness is 0 which means
that we cannot really use up all the totalpages pages. This in turn
confuses oom score calculation if the memcg limit is much smaller than
the available swap because the used memory (capped by the limit) is
negligible comparing to totalpages so the resulting score is too small
if adj!=0 (typically task with CAP_SYS_ADMIN or non zero oom_score_adj).
A wrong process might be selected as result.

The problem can be worked around by checking mem_cgroup_swappiness==0
and not considering swap at all in such a case.

Signed-off-by: Michal Hocko
Acked-by: David Rientjes
Acked-by: Johannes Weiner
Acked-by: KOSAKI Motohiro
Acked-by: KAMEZAWA Hiroyuki
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2012-11-17 06:33:04 +0800
1756954c6 mm: fix build warning for uninitialized value ... Browse Code »

do_wp_page() sets mmun_called if mmun_start and mmun_end were
initialized and, if so, may call mmu_notifier_invalidate_range_end()
with these values. This doesn't prevent gcc from emitting a build
warning though:

mm/memory.c: In function `do_wp_page':
mm/memory.c:2530: warning: `mmun_start' may be used uninitialized in this function
mm/memory.c:2531: warning: `mmun_end' may be used uninitialized in this function

It's much easier to initialize the variables to impossible values and do
a simple comparison to determine if they were initialized to remove the
bool entirely.

Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2012-11-17 06:33:03 +0800
63c3b902e mm: add anon_vma_lock to validate_mm() ... Browse Code »

Iterating over the vma->anon_vma_chain without anon_vma_lock may cause
NULL ptr deref in anon_vma_interval_tree_verify(), because the node in the
chain might have been removed.

BUG: unable to handle kernel paging request at fffffffffffffff0
IP: [] anon_vma_interval_tree_verify+0xc/0xa0
PGD 4e28067 PUD 4e29067 PMD 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU 0
Pid: 9050, comm: trinity-child64 Tainted: G W 3.7.0-rc2-next-20121025-sasha-00001-g673f98e-dirty #77
RIP: 0010: anon_vma_interval_tree_verify+0xc/0xa0
Process trinity-child64 (pid: 9050, threadinfo ffff880045f80000, task ffff880048eb0000)
Call Trace:
validate_mm+0x58/0x1e0
vma_adjust+0x635/0x6b0
__split_vma.isra.22+0x161/0x220
split_vma+0x24/0x30
sys_madvise+0x5da/0x7b0
tracesys+0xe1/0xe6
RIP anon_vma_interval_tree_verify+0xc/0xa0
CR2: fffffffffffffff0

Figured out by Bob Liu.

Reported-by: Sasha Levin
Cc: Bob Liu
Signed-off-by: Michel Lespinasse
Reviewed-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2012-11-17 06:33:03 +0800

09 Nov, 2012

1 commit

b0a8cc58e mm: bugfix: set current->reclaim_state to NULL while returning from kswapd() ... Browse Code »

In kswapd(), set current->reclaim_state to NULL before returning, as
current->reclaim_state holds reference to variable on kswapd()'s stack.

In rare cases, while returning from kswapd() during memory offlining,
__free_slab() and freepages() can access the dangling pointer of
current->reclaim_state.

Signed-off-by: Takamori Yamaguchi
Signed-off-by: Aaditya Kumar
Acked-by: David Rientjes
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Takamori Yamaguchi
2012-11-09 13:41:47 +0800

27 Oct, 2012

1 commit

622f202a4 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull x86 fixes from Ingo Molnar:
"This fixes a couple of nasty page table initialization bugs which were
causing kdump regressions. A clean rearchitecturing of the code is in
the works - meanwhile these are reverts that restore the
best-known-working state of the kernel.

There's also EFI fixes and other small fixes."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, mm: Undo incorrect revert in arch/x86/mm/init.c
x86: efi: Turn off efi_enabled after setup on mixed fw/kernel
x86, mm: Find_early_table_space based on ranges that are actually being mapped
x86, mm: Use memblock memory loop instead of e820_RAM
x86, mm: Trim memory in memblock to be page aligned
x86/irq/ioapic: Check for valid irq_cfg pointer in smp_irq_move_cleanup_interrupt
x86/efi: Fix oops caused by incorrect set_memory_uc() usage
x86-64: Fix page table accounting
Revert "x86/mm: Fix the size calculation of mapping tables"
MAINTAINERS: Add EFI git repository location

Linus Torvalds
2012-10-27 00:35:46 +0800

26 Oct, 2012

4 commits

6b187d026 mm, numa: avoid setting zone_reclaim_mode unless a node is sufficiently distant ... Browse Code »

Commit 957f822a0ab9 ("mm, numa: reclaim from all nodes within reclaim
distance") caused zone_reclaim_mode to be set for all systems where two
nodes are within RECLAIM_DISTANCE of each other. This is the opposite
of what we actually want: zone_reclaim_mode should be set if two nodes
are sufficiently distant.

Signed-off-by: David Rientjes
Reported-by: Julian Wollrath
Tested-by: Julian Wollrath
Cc: Hugh Dickins
Cc: Patrik Kullman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2012-10-26 05:37:53 +0800
35cfa2b0b mm/mmu_notifier: allocate mmu_notifier in advance ... Browse Code »

While allocating mmu_notifier with parameter GFP_KERNEL, swap would start
to work in case of tight available memory. Eventually, that would lead to
a deadlock while the swap deamon swaps anonymous pages. It was caused by
commit e0f3c3f78da29b ("mm/mmu_notifier: init notifier if necessary").

=================================
[ INFO: inconsistent lock state ]
3.7.0-rc1+ #518 Not tainted
---------------------------------
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
kswapd0/35 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&mapping->i_mmap_mutex){+.+.?.}, at: page_referenced+0x9c/0x2e0
{RECLAIM_FS-ON-W} state was registered at:
mark_held_locks+0x86/0x150
lockdep_trace_alloc+0x67/0xc0
kmem_cache_alloc_trace+0x33/0x230
do_mmu_notifier_register+0x87/0x180
mmu_notifier_register+0x13/0x20
kvm_dev_ioctl+0x428/0x510
do_vfs_ioctl+0x98/0x570
sys_ioctl+0x91/0xb0
system_call_fastpath+0x16/0x1b
irq event stamp: 825
hardirqs last enabled at (825): _raw_spin_unlock_irq+0x30/0x60
hardirqs last disabled at (824): _raw_spin_lock_irq+0x19/0x80
softirqs last enabled at (0): copy_process+0x630/0x17c0
softirqs last disabled at (0): (null)
...

Simply back out the above commit, which was a small performance
optimization.

Signed-off-by: Gavin Shan
Reported-by: Andrea Righi
Tested-by: Andrea Righi
Cc: Wanpeng Li
Cc: Andrea Arcangeli
Cc: Avi Kivity
Cc: Hugh Dickins
Cc: Marcelo Tosatti
Cc: Xiao Guangrong
Cc: Sagi Grimberg
Cc: Haggai Eran
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gavin Shan
2012-10-26 05:37:53 +0800
86a595f96 mm/page_alloc.c:alloc_contig_range(): return early for err path ... Browse Code »

If start_isolate_page_range() failed, unset_migratetype_isolate() has been
done inside it.

Signed-off-by: Bob Liu
Cc: Ni zhan Chen
Cc: Marek Szyprowski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bob Liu
2012-10-26 05:37:52 +0800
ef5d437f7 mm: fix XFS oops due to dirty pages without buffers on s390 ... Browse Code »

On s390 any write to a page (even from kernel itself) sets architecture
specific page dirty bit. Thus when a page is written to via buffered
write, HW dirty bit gets set and when we later map and unmap the page,
page_remove_rmap() finds the dirty bit and calls set_page_dirty().

Dirtying of a page which shouldn't be dirty can cause all sorts of
problems to filesystems. The bug we observed in practice is that
buffers from the page get freed, so when the page gets later marked as
dirty and writeback writes it, XFS crashes due to an assertion
BUG_ON(!PagePrivate(page)) in page_buffers() called from
xfs_count_page_state().

Similar problem can also happen when zero_user_segment() call from
xfs_vm_writepage() (or block_write_full_page() for that matter) set the
hardware dirty bit during writeback, later buffers get freed, and then
page unmapped.

Fix the issue by ignoring s390 HW dirty bit for page cache pages of
mappings with mapping_cap_account_dirty(). This is safe because for
such mappings when a page gets marked as writeable in PTE it is also
marked dirty in do_wp_page() or do_page_fault(). When the dirty bit is
cleared by clear_page_dirty_for_io(), the page gets writeprotected in
page_mkclean(). So pagecache page is writeable if and only if it is
dirty.

Thanks to Hugh Dickins for pointing out mapping has to have
mapping_cap_account_dirty() for things to work and proposing a cleaned
up variant of the patch.

The patch has survived about two hours of running fsx-linux on tmpfs
while heavily swapping and several days of running on out build machines
where the original problem was triggered.

Signed-off-by: Jan Kara
Cc: Martin Schwidefsky
Cc: Mel Gorman
Cc: Hugh Dickins
Cc: Heiko Carstens
Cc: [3.0+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2012-10-26 05:37:52 +0800

25 Oct, 2012

1 commit

6ede1fd3c x86, mm: Trim memory in memblock to be page aligned ... Browse Code »

We will not map partial pages, so need to make sure memblock
allocation will not allocate those bytes out.

Also we will use for_each_mem_pfn_range() to loop to map memory
range to keep them consistent.

Signed-off-by: Yinghai Lu
Link: http://lkml.kernel.org/r/CAE9FiQVZirvaBMFYRfXMmWEcHbKSicQEHz4VAwUv0xFCk51ZNw@mail.gmail.com
Acked-by: Jacob Shin
Signed-off-by: H. Peter Anvin
Cc:

Yinghai Lu
2012-10-25 02:52:21 +0800

20 Oct, 2012

5 commits

37820108f Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc ... Browse Code »

Pull ARM soc fixes from Olof Johansson:
"A set of fixes and some minor cleanups for -rc2:

- A series from Arnd that fixes warnings in drivers and other code
included by ARM defconfigs. Most have been acked by corresponding
maintainers (and seem quite hard to argue not picking up anyway in
the few exception cases).
- A few misc patches from the list for integrator/vt8500/i.MX
- A batch of fixes to OMAP platforms, fixing:
- boot problems on beaglebone,
- regression fixes for local timers
- clockdomain locking fixes
- a few boot/sparse warnings
- For Tegra:
- Clock rate calculation overflow fix
- Revert a change that removed timer clocks and a fix for symbol
name clashes
- For Renesas:
- IO accessor / annotation cleanups to remove warnings
- For Kirkwood/Dove/mvebu:
- Fixes for device trees for Dove (some minor cleanups, some fixes)
- Fixes for the mvebu gpio driver
- Fix build problem for Feroceon due to missing ifdefs
- Fix lsxl DTS files"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (31 commits)
ARM: kirkwood: fix buttons on lsxl boards
ARM: kirkwood: fix LEDs names for lsxl boards
ARM: Kirkwood: fix disabling CACHE_FEROCEON_L2
gpio: mvebu: Add missing breaks in mvebu_gpio_irq_set_type
ARM: dove: Add crypto engine to DT
ARM: dove: Remove watchdog from DT
ARM: dove: Restructure SoC device tree descriptor
ARM: dove: Fix clock names of sata and gbe
ARM: dove: Fix tauros2 device tree init
ARM: dove: Add pcie clock support
ARM: OMAP2+: Allow kernel to boot even if GPMC fails to reserve memory
ARM: OMAP: clockdomain: Fix locking on _clkdm_clk_hwmod_enable / disable
ARM: s3c: mark s3c2440_clk_add as __init_refok
spi/s3c64xx: use correct dma_transfer_direction type
ARM: OMAP4: devices: fixup OMAP4 DMIC platform device error message
ARM: OMAP2+: clock data: Add dev-id for the omap-gpmc dummy fck
ARM: OMAP: resolve sparse warning concerning debug_card_init()
ARM: OMAP4: Fix twd_local_timer_register regression
ARM: tegra: add tegra_timer clock
ARM: tegra: rename tegra system timer
...

Linus Torvalds
2012-10-20 08:32:37 +0800
068a565af Merge branch 'testing/driver-warnings' of git://git.kernel.org/pub/scm/linux/ker… ... Browse Code »

…nel/git/arm/arm-soc into fixes

A collection of warning fixes on non-ARM code from Arnd Bergmann:

* 'testing/driver-warnings' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: s3c: mark s3c2440_clk_add as __init_refok
spi/s3c64xx: use correct dma_transfer_direction type
pcmcia: sharpsl: don't discard sharpsl_pcmcia_ops
USB: EHCI: mark ehci_orion_conf_mbus_windows __devinit
mm/slob: use min_t() to compare ARCH_SLAB_MINALIGN
SCSI: ARM: make fas216_dumpinfo function conditional
SCSI: ARM: ncr5380/oak uses no interrupts

Olof Johansson
2012-10-20 06:40:18 +0800
4a1f2b0fb Merge branch 'akpm' (Fixes from Andrew) ... Browse Code »

Merge misc fixes from Andrew Morton:
"Seven fixes"

* emailed patches from Andrew Morton : (7 patches)
lib/dma-debug.c: fix __hash_bucket_find()
mm: compaction: correct the nr_strict va isolated check for CMA
firmware/memmap: avoid type conflicts with the generic memmap_init()
pidns: remove recursion from free_pid_ns()
drivers/video/backlight/lm3639_bl.c: return proper error in lm3639_bled_mode_store() error paths
kernel/sys.c: fix stack memory content leak via UNAME26
linux/coredump.h needs asm/siginfo.h

Linus Torvalds
2012-10-20 05:07:55 +0800
0db63d7e2 mm: compaction: correct the nr_strict va isolated check for CMA ... Browse Code »

Thierry reported that the "iron out" patch for isolate_freepages_block()
had problems due to the strict check being too strict with "mm:
compaction: Iron out isolate_freepages_block() and
isolate_freepages_range() -fix1". It's possible that more pages than
necessary are isolated but the check still fails and I missed that this
fix was not picked up before RC1. This same problem has been identified
in 3.7-RC1 by Tony Prisk and should be addressed by the following patch.

Signed-off-by: Mel Gorman
Tested-by: Tony Prisk
Reported-by: Thierry Reding
Acked-by: Rik van Riel
Acked-by: Minchan Kim
Cc: Richard Davies
Cc: Shaohua Li
Cc: Avi Kivity
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-10-20 05:07:47 +0800
deb521c44 remap_file_pages: correctly handle the case of a NULL vm_ops pointer ... Browse Code »

In commit 0b173bc4daa8 ("mm: kill vma flag VM_CAN_NONLINEAR") we
replaced the VM_CAN_NONLINEAR test with checking whether the mapping has
a '->remap_pages()' vm operation, but there is no guarantee that there
it even has a vm_ops pointer at all.

Add the appropriate test for NULL vm_ops.

Reported-by: Sasha Levin
Cc: Konstantin Khlebnikov
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Linus Torvalds
2012-10-20 04:37:57 +0800

17 Oct, 2012

1 commit

32f8516a8 mm, mempolicy: fix printing stack contents in numa_maps ... Browse Code »

When reading /proc/pid/numa_maps, it's possible to return the contents of
the stack where the mempolicy string should be printed if the policy gets
freed from beneath us.

This happens because mpol_to_str() may return an error the
stack-allocated buffer is then printed without ever being stored.

There are two possible error conditions in mpol_to_str():

- if the buffer allocated is insufficient for the string to be stored,
and

- if the mempolicy has an invalid mode.

The first error condition is not triggered in any of the callers to
mpol_to_str(): at least 50 bytes is always allocated on the stack and this
is sufficient for the string to be written. A future patch should convert
this into BUILD_BUG_ON() since we know the maximum strlen possible, but
that's not -rc material.

The second error condition is possible if a race occurs in dropping a
reference to a task's mempolicy causing it to be freed during the read().
The slab poison value is then used for the mode and mpol_to_str() returns
-EINVAL.

This race is only possible because get_vma_policy() believes that
mm->mmap_sem protects task->mempolicy, which isn't true. The exit path
does not hold mm->mmap_sem when dropping the reference or setting
task->mempolicy to NULL: it uses task_lock(task) instead.

Thus, it's required for the caller of a task mempolicy to hold
task_lock(task) while grabbing the mempolicy and reading it. Callers with
a vma policy store their mempolicy earlier and can simply increment the
reference count so it's guaranteed not to be freed.

Reported-by: Dave Jones
Signed-off-by: David Rientjes
Signed-off-by: Linus Torvalds

David Rientjes
2012-10-17 09:00:50 +0800

15 Oct, 2012

1 commit

325adeb55 mm: huge_memory: Fix build error. ... Browse Code »

Certain configurations won't implicitly pull in resulting
in the following build error:

mm/huge_memory.c: In function 'release_pte_page':
mm/huge_memory.c:1697:2: error: implicit declaration of function 'unlock_page' [-Werror=implicit-function-declaration]
mm/huge_memory.c: In function '__collapse_huge_page_isolate':
mm/huge_memory.c:1757:3: error: implicit declaration of function 'trylock_page' [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors

Reported-by: David Daney
Signed-off-by: Ralf Baechle
Signed-off-by: Linus Torvalds

Ralf Baechle
2012-10-15 22:59:15 +0800

13 Oct, 2012

3 commits

8418263e3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull third pile of VFS updates from Al Viro:
"Stuff from Jeff Layton, mostly. Sanitizing interplay between audit
and namei, removing a lot of insanity from audit_inode() mess and
getting things ready for his ESTALE patchset."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
procfs: don't need a PATH_MAX allocation to hold a string representation of an int
vfs: embed struct filename inside of names_cache allocation if possible
audit: make audit_inode take struct filename
vfs: make path_openat take a struct filename pointer
vfs: turn do_path_lookup into wrapper around struct filename variant
audit: allow audit code to satisfy getname requests from its names_list
vfs: define struct filename and have getname() return it
vfs: unexport getname and putname symbols
acct: constify the name arg to acct_on
vfs: allocate page instead of names_cache buffer in mount_block_root
audit: overhaul __audit_inode_child to accomodate retrying
audit: optimize audit_compare_dname_path
audit: make audit_compare_dname_path use parent_len helper
audit: remove dirlen argument to audit_compare_dname_path
audit: set the name_len in audit_inode for parent lookups
audit: add a new "type" field to audit_names struct
audit: reverse arguments to audit_inode_child
audit: no need to walk list in audit_inode if name is NULL
audit: pass in dentry to audit_copy_inode wherever possible
audit: remove unnecessary NULL ptr checks from do_path_lookup

Linus Torvalds
2012-10-13 09:04:42 +0800
669abf4e5 vfs: make path_openat take a struct filename pointer ... Browse Code »

...and fix up the callers. For do_file_open_root, just declare a
struct filename on the stack and fill out the .name field. For
do_filp_open, make it also take a struct filename pointer, and fix up its
callers to call it appropriately.

For filp_open, add a variant that takes a struct filename pointer and turn
filp_open into a wrapper around it.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-10-13 08:15:09 +0800
91a27b2a7 vfs: define struct filename and have getname() return it ... Browse Code »

getname() is intended to copy pathname strings from userspace into a
kernel buffer. The result is just a string in kernel space. It would
however be quite helpful to be able to attach some ancillary info to
the string.

For instance, we could attach some audit-related info to reduce the
amount of audit-related processing needed. When auditing is enabled,
we could also call getname() on the string more than once and not
need to recopy it from userspace.

This patchset converts the getname()/putname() interfaces to return
a struct instead of a string. For now, the struct just tracks the
string in kernel space and the original userland pointer for it.

Later, we'll add other information to the struct as it becomes
convenient.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-10-13 08:14:55 +0800

12 Oct, 2012

3 commits

3dc329baa Merge branch 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux ... Browse Code »

Pull SLAB fix from Pekka Enberg:
"This contains a lockdep false positive fix from Jiri Kosina I missed
from the previous pull request."

* 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
mm, slab: release slab_mutex earlier in kmem_cache_destroy()

Linus Torvalds
2012-10-12 21:19:28 +0800
79360ddd7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull pile 2 of vfs updates from Al Viro:
"Stuff in this one - assorted fixes, lglock tidy-up, death to
lock_super().

There'll be a VFS pile tomorrow (with patches from Jeff Layton,
sanitizing getname() and related parts of audit and preparing for
ESTALE fixes), but I'd rather push the stuff in this one ASAP - some
of the bugs closed here are quite unpleasant."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: bogus warnings in fs/namei.c
consitify do_mount() arguments
lglock: add DEFINE_STATIC_LGLOCK()
lglock: make the per_cpu locks static
lglock: remove unused DEFINE_LGLOCK_LOCKDEP()
MAX_LFS_FILESIZE definition for 64bit needs LL...
tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking
vfs: drop lock/unlock super
ufs: drop lock/unlock super
sysv: drop lock/unlock super
hpfs: drop lock/unlock super
fat: drop lock/unlock super
ext3: drop lock/unlock super
exofs: drop lock/unlock super
dup3: Return an error when oldfd == newfd.
fs: handle failed audit_log_start properly
fs: prevent use after free in auditing when symlink following was denied

Linus Torvalds
2012-10-12 09:52:03 +0800
40924754f Merge branch 'writeback-for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux ... Browse Code »

Pull writeback fixes from Fengguang Wu:
"Three trivial writeback fixes"

* 'writeback-for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
CPU hotplug, writeback: Don't call writeback_set_ratelimit() too often during hotplug
writeback: correct comment for move_expired_inodes()
backing-dev: use kstrto* in preference to simple_strtoul

Linus Torvalds
2012-10-12 09:46:03 +0800

10 Oct, 2012

3 commits

210ed9def mm, slab: release slab_mutex earlier in kmem_cache_destroy() ... Browse Code »

Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
__stop_machine()") introduced slab_mutex -> cpu_hotplug.lock dependency
through kmem_cache_destroy() -> rcu_barrier() -> _rcu_barrier() ->
get_online_cpus().

Lockdep thinks that this might actually result in ABBA deadlock,
and reports it as below:

=== [ cut here ] ===
======================================================
[ INFO: possible circular locking dependency detected ]
3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
-------------------------------------------------------
kworker/u:2/40 is trying to acquire lock:
(rcu_sched_state.barrier_mutex){+.+...}, at: [] _rcu_barrier+0x26/0x1e0

but task is already holding lock:
(slab_mutex){+.+.+.}, at: [] kmem_cache_destroy+0x45/0xe0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (slab_mutex){+.+.+.}:
[] validate_chain+0x632/0x720
[] __lock_acquire+0x309/0x530
[] lock_acquire+0x121/0x190
[] __mutex_lock_common+0x5c/0x450
[] mutex_lock_nested+0x3e/0x50
[] cpuup_callback+0x2f/0xbe
[] notifier_call_chain+0x93/0x140
[] __raw_notifier_call_chain+0x9/0x10
[] _cpu_up+0xba/0x14e
[] cpu_up+0xbc/0x117
[] smp_init+0x6b/0x9f
[] kernel_init+0x147/0x1dc
[] kernel_thread_helper+0x4/0x10

-> #1 (cpu_hotplug.lock){+.+.+.}:
[] validate_chain+0x632/0x720
[] __lock_acquire+0x309/0x530
[] lock_acquire+0x121/0x190
[] __mutex_lock_common+0x5c/0x450
[] mutex_lock_nested+0x3e/0x50
[] get_online_cpus+0x37/0x50
[] _rcu_barrier+0xbb/0x1e0
[] rcu_barrier_sched+0x10/0x20
[] rcu_barrier+0x9/0x10
[] deactivate_locked_super+0x49/0x90
[] deactivate_super+0x61/0x70
[] mntput_no_expire+0x127/0x180
[] sys_umount+0x6e/0xd0
[] system_call_fastpath+0x16/0x1b

-> #0 (rcu_sched_state.barrier_mutex){+.+...}:
[] check_prev_add+0x3de/0x440
[] validate_chain+0x632/0x720
[] __lock_acquire+0x309/0x530
[] lock_acquire+0x121/0x190
[] __mutex_lock_common+0x5c/0x450
[] mutex_lock_nested+0x3e/0x50
[] _rcu_barrier+0x26/0x1e0
[] rcu_barrier_sched+0x10/0x20
[] rcu_barrier+0x9/0x10
[] kmem_cache_destroy+0xd1/0xe0
[] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
[] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
[] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
[] ops_exit_list+0x39/0x60
[] cleanup_net+0xfb/0x1b0
[] process_one_work+0x26b/0x4c0
[] worker_thread+0x12e/0x320
[] kthread+0x9e/0xb0
[] kernel_thread_helper+0x4/0x10

other info that might help us debug this:

Chain exists of:
rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(slab_mutex);
lock(cpu_hotplug.lock);
lock(slab_mutex);
lock(rcu_sched_state.barrier_mutex);

*** DEADLOCK ***
=== [ cut here ] ===

This is actually a false positive. Lockdep has no way of knowing the fact
that the ABBA can actually never happen, because of special semantics of
cpu_hotplug.refcount and its handling in cpu_hotplug_begin(); the mutual
exclusion there is not achieved through mutex, but through
cpu_hotplug.refcount.

The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
until everyone who called get_online_cpus() will call put_online_cpus()"
semantics is totally invisible to lockdep.

This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
is being called with it unlocked. It has two advantages:

- it slightly reduces hold time of slab_mutex; as it's used to protect
the cachep list, it's not necessary to hold it over kmem_cache_free()
call any more
- it silences the lockdep false positive warning, as it avoids lockdep ever
learning about slab_mutex -> cpu_hotplug.lock dependency

Reviewed-by: Paul E. McKenney
Reviewed-by: Srivatsa S. Bhat
Acked-by: David Rientjes
Signed-off-by: Jiri Kosina
Signed-off-by: Pekka Enberg

Jiri Kosina
2012-10-10 14:25:08 +0800
35c2a7f49 tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking ... Browse Code »

Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(),
u64 inum = fid->raw[2];
which is unhelpfully reported as at the end of shmem_alloc_inode():

BUG: unable to handle kernel paging request at ffff880061cd3000
IP: [] shmem_alloc_inode+0x40/0x40
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Call Trace:
[] ? exportfs_decode_fh+0x79/0x2d0
[] do_handle_open+0x163/0x2c0
[] sys_open_by_handle_at+0xc/0x10
[] tracesys+0xe1/0xe6

Right, tmpfs is being stupid to access fid->raw[2] before validating that
fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may
fall at the end of a page, and the next page not be present.

But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being
careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and
could oops in the same way: add the missing fh_len checks to those.

Reported-by: Sasha Levin
Signed-off-by: Hugh Dickins
Cc: Al Viro
Cc: Sage Weil
Cc: Steven Whitehouse
Cc: Christoph Hellwig
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro

Hugh Dickins
2012-10-10 11:33:55 +0800
baaf1dd49 mm/slob: use min_t() to compare ARCH_SLAB_MINALIGN ... Browse Code »

The definition of ARCH_SLAB_MINALIGN is architecture dependent
and can be either of type size_t or int. Comparing that value
with ARCH_KMALLOC_MINALIGN can cause harmless warnings on
platforms where they are different. Since both are always
small positive integer numbers, using the size_t type to compare
them is safe and gets rid of the warning.

Without this patch, building ARM collie_defconfig results in:

mm/slob.c: In function '__kmalloc_node':
mm/slob.c:431:152: warning: comparison of distinct pointer types lacks a cast [enabled by default]
mm/slob.c: In function 'kfree':
mm/slob.c:484:153: warning: comparison of distinct pointer types lacks a cast [enabled by default]
mm/slob.c: In function 'ksize':
mm/slob.c:503:153: warning: comparison of distinct pointer types lacks a cast [enabled by default]

Signed-off-by: Arnd Bergmann
Acked-by: Christoph Lameter
Cc: Pekka Enberg

Arnd Bergmann
2012-10-10 03:56:28 +0800

09 Oct, 2012

4 commits

f5c8ad472 mm: thp: Use more portable PMD clearing sequenece in zap_huge_pmd(). ... Browse Code »

Invalidation sequences are handled in various ways on various
architectures.

One way, which sparc64 uses, is to let the set_*_at() functions accumulate
pending flushes into a per-cpu array. Then the flush_tlb_range() et al.
calls process the pending TLB flushes.

In this regime, the __tlb_remove_*tlb_entry() implementations are
essentially NOPs.

The canonical PTE zap in mm/memory.c is:

ptent = ptep_get_and_clear_full(mm, addr, pte,
tlb->fullmm);
tlb_remove_tlb_entry(tlb, pte, addr);

With a subsequent tlb_flush_mmu() if needed.

Mirror this in the THP PMD zapping using:

orig_pmd = pmdp_get_and_clear(tlb->mm, addr, pmd);
page = pmd_page(orig_pmd);
tlb_remove_pmd_tlb_entry(tlb, pmd, addr);

And we properly accomodate TLB flush mechanims like the one described
above.

Signed-off-by: David S. Miller
Cc: Andrea Arcangeli
Cc: Johannes Weiner
Cc: Gerald Schaefer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Miller
2012-10-09 15:23:06 +0800
b113da657 mm: Add and use update_mmu_cache_pmd() in transparent huge page code. ... Browse Code »

The transparent huge page code passes a PMD pointer in as the third
argument of update_mmu_cache(), which expects a PTE pointer.

This never got noticed because X86 implements update_mmu_cache() as a
macro and thus we don't get any type checking, and X86 is the only
architecture which supports transparent huge pages currently.

Before other architectures can support transparent huge pages properly we
need to add a new interface which will take a PMD pointer as the third
argument rather than a PTE pointer.

[akpm@linux-foundation.org: implement update_mm_cache_pmd() for s390]
Signed-off-by: David S. Miller
Cc: Andrea Arcangeli
Cc: Johannes Weiner
Cc: Gerald Schaefer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Miller
2012-10-09 15:23:05 +0800
d760afd4d memory-hotplug: suppress "Trying to free nonexistent resource <XXXXXXXXXXXXXXXX-… ... Browse Code »

…YYYYYYYYYYYYYYYY>" warning

When our x86 box calls __remove_pages(), release_mem_region() shows many
warnings. And x86 box cannot unregister iomem_resource.

"Trying to free nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>"

release_mem_region() has been changed to be called in each
PAGES_PER_SECTION by commit de7f0cba9678 ("memory hotplug: release
memory regions in PAGES_PER_SECTION chunks"). Because powerpc registers
iomem_resource in each PAGES_PER_SECTION chunk. But when I hot add
memory on x86 box, iomem_resource is register in each _CRS not
PAGES_PER_SECTION chunk. So x86 box unregisters iomem_resource.

The patch fixes the problem.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@austin.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Yasuaki Ishimatsu
2012-10-09 15:23:04 +0800
7795912c2 mm: document PageHuge somewhat ... Browse Code »

Acked-by: David Rientjes
Cc: Mel Gorman
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2012-10-09 15:23:03 +0800