Eric Lee / smarc-fsl-linux-kernel

15 Nov, 2010

1 commit

2744b8889 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
slub: Fix slub_lock down/up imbalance

Linus Torvalds
2010-11-15 05:06:37 +0800

14 Nov, 2010

1 commit

68cee4f11 slub: Fix slub_lock down/up imbalance ... Browse Code »

There are two places, that do not release the slub_lock.

Respective bugs were introduced by sysfs changes ab4d5ed5 (slub: Enable
sysfs support for !CONFIG_SLUB_DEBUG) and 2bce6485 ( slub: Allow removal
of slab caches during boot).

Acked-by: Christoph Lameter
Signed-off-by: Pavel Emelyanov
Signed-off-by: Pekka Enberg

Pavel Emelyanov
2010-11-14 22:53:11 +0800

12 Nov, 2010

4 commits

27d20fddc radix-tree: fix RCU bug ... Browse Code »

Salman Qazi describes the following radix-tree bug:

In the following case, we get can get a deadlock:

0. The radix tree contains two items, one has the index 0.
1. The reader (in this case find_get_pages) takes the rcu_read_lock.
2. The reader acquires slot(s) for item(s) including the index 0 item.
3. The non-zero index item is deleted, and as a consequence the other item is
moved to the root of the tree. The place where it used to be is queued for
deletion after the readers finish.
3b. The zero item is deleted, removing it from the direct slot, it remains in
the rcu-delayed indirect node.
4. The reader looks at the index 0 slot, and finds that the page has 0 ref
count
5. The reader looks at it again, hoping that the item will either be freed or
the ref count will increase. This never happens, as the slot it is looking
at will never be updated. Also, this slot can never be reclaimed because
the reader is holding rcu_read_lock and is in an infinite loop.

The fix is to re-use the same "indirect" pointer case that requires a slot
lookup retry into a general "retry the lookup" bit.

Signed-off-by: Nick Piggin
Reported-by: Salman Qazi
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2010-11-12 23:55:32 +0800
1dce071e1 vmscan: avoid setting zone congested if no page dirty ... Browse Code »

nr_dirty and nr_congested are increased only when the page is dirty. So
if all pages are clean, both them will be zero. In this case, we should
not mark the zone congested.

Signed-off-by: Shaohua Li
Reviewed-by: Johannes Weiner
Reviewed-by: Minchan Kim
Acked-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shaohua Li
2010-11-12 23:55:31 +0800
8d056cb96 mm/vfs: revalidate page->mapping in do_generic_file_read() ... Browse Code »

70 hours into some stress tests of a 2.6.32-based enterprise kernel, we
ran into a NULL dereference in here:

int block_is_partially_uptodate(struct page *page, read_descriptor_t *desc,
unsigned long from)
{
----> struct inode *inode = page->mapping->host;

It looks like page->mapping was the culprit. (xmon trace is below).
After closer examination, I realized that do_generic_file_read() does a
find_get_page(), and eventually locks the page before calling
block_is_partially_uptodate(). However, it doesn't revalidate the
page->mapping after the page is locked. So, there's a small window
between the find_get_page() and ->is_partially_uptodate() where the page
could get truncated and page->mapping cleared.

We _have_ a reference, so it can't get reclaimed, but it certainly
can be truncated.

I think the correct thing is to check page->mapping after the
trylock_page(), and jump out if it got truncated. This patch has been
running in the test environment for a month or so now, and we have not
seen this bug pop up again.

xmon info:

1f:mon> e
cpu 0x1f: Vector: 300 (Data Access) at [c0000002ae36f770]
pc: c0000000001e7a6c: .block_is_partially_uptodate+0xc/0x100
lr: c000000000142944: .generic_file_aio_read+0x1e4/0x770
sp: c0000002ae36f9f0
msr: 8000000000009032
dar: 0
dsisr: 40000000
current = 0xc000000378f99e30
paca = 0xc000000000f66300
pid = 21946, comm = bash
1f:mon> r
R00 = 0025c0500000006d R16 = 0000000000000000
R01 = c0000002ae36f9f0 R17 = c000000362cd3af0
R02 = c000000000e8cd80 R18 = ffffffffffffffff
R03 = c0000000031d0f88 R19 = 0000000000000001
R04 = c0000002ae36fa68 R20 = c0000003bb97b8a0
R05 = 0000000000000000 R21 = c0000002ae36fa68
R06 = 0000000000000000 R22 = 0000000000000000
R07 = 0000000000000001 R23 = c0000002ae36fbb0
R08 = 0000000000000002 R24 = 0000000000000000
R09 = 0000000000000000 R25 = c000000362cd3a80
R10 = 0000000000000000 R26 = 0000000000000002
R11 = c0000000001e7b60 R27 = 0000000000000000
R12 = 0000000042000484 R28 = 0000000000000001
R13 = c000000000f66300 R29 = c0000003bb97b9b8
R14 = 0000000000000001 R30 = c000000000e28a08
R15 = 000000000000ffff R31 = c0000000031d0f88
pc = c0000000001e7a6c .block_is_partially_uptodate+0xc/0x100
lr = c000000000142944 .generic_file_aio_read+0x1e4/0x770
msr = 8000000000009032 cr = 22000488
ctr = c0000000001e7a60 xer = 0000000020000000 trap = 300
dar = 0000000000000000 dsisr = 40000000
1f:mon> t
[link register ] c000000000142944 .generic_file_aio_read+0x1e4/0x770
[c0000002ae36f9f0] c000000000142a14 .generic_file_aio_read+0x2b4/0x770 (unreliable)
[c0000002ae36fb40] c0000000001b03e4 .do_sync_read+0xd4/0x160
[c0000002ae36fce0] c0000000001b153c .vfs_read+0xec/0x1f0
[c0000002ae36fd80] c0000000001b1768 .SyS_read+0x58/0xb0
[c0000002ae36fe30] c00000000000852c syscall_exit+0x0/0x40
--- Exception: c00 (System Call) at 00000080a840bc54
SP (fffca15df30) is in userspace
1f:mon> di c0000000001e7a6c
c0000000001e7a6c e9290000 ld r9,0(r9)
c0000000001e7a70 418200c0 beq c0000000001e7b30 # .block_is_partially_uptodate+0xd0/0x100
c0000000001e7a74 e9440008 ld r10,8(r4)
c0000000001e7a78 78a80020 clrldi r8,r5,32
c0000000001e7a7c 3c000001 lis r0,1
c0000000001e7a80 812900a8 lwz r9,168(r9)
c0000000001e7a84 39600001 li r11,1
c0000000001e7a88 7c080050 subf r0,r8,r0
c0000000001e7a8c 7f805040 cmplw cr7,r0,r10
c0000000001e7a90 7d6b4830 slw r11,r11,r9
c0000000001e7a94 796b0020 clrldi r11,r11,32
c0000000001e7a98 419d00a8 bgt cr7,c0000000001e7b40 # .block_is_partially_uptodate+0xe0/0x100
c0000000001e7a9c 7fa55840 cmpld cr7,r5,r11
c0000000001e7aa0 7d004214 add r8,r0,r8
c0000000001e7aa4 79080020 clrldi r8,r8,32
c0000000001e7aa8 419c0078 blt cr7,c0000000001e7b20 # .block_is_partially_uptodate+0xc0/0x100

Signed-off-by: Dave Hansen
Reviewed-by: Minchan Kim
Reviewed-by: Johannes Weiner
Acked-by: Rik van Riel
Cc:
Cc:
Cc: Christoph Hellwig
Cc: Al Viro
Cc: Minchan Kim
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Hansen
2010-11-12 23:55:31 +0800
d2e61b8dc memcg: null dereference on allocation failure ... Browse Code »

The original code had a null dereference if alloc_percpu() failed. This
was introduced in commit 711d3d2c9bc3 ("memcg: cpu hotplug aware percpu
count updates")

Signed-off-by: Dan Carpenter
Reviewed-by: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Carpenter
2010-11-12 23:55:31 +0800

10 Nov, 2010

1 commit

63bfd7384 perf_events: Fix perf_counter_mmap() hook in mprotect() ... Browse Code »

As pointed out by Linus, commit dab5855 ("perf_counter: Add mmap event hooks to
mprotect()") is fundamentally wrong as mprotect_fixup() can free 'vma' due to
merging. Fix the problem by moving perf_event_mmap() hook to
mprotect_fixup().

Note: there's another successful return path from mprotect_fixup() if old
flags equal to new flags. We don't, however, need to call
perf_event_mmap() there because 'perf' already knows the VMA is
executable.

Reported-by: Dave Jones
Analyzed-by: Linus Torvalds
Cc: Ingo Molnar
Reviewed-by: Peter Zijlstra
Signed-off-by: Pekka Enberg
Signed-off-by: Linus Torvalds

Pekka Enberg
2010-11-10 02:19:38 +0800

04 Nov, 2010

1 commit

ff8b16d7e vmstat: fix offset calculation on void* ... Browse Code »

Fix regression introduced by commit 79da826aee6 ("writeback: report
dirty thresholds in /proc/vmstat").

The incorrect pointer arithmetic can result in problems like this:

BUG: unable to handle kernel paging request at 07c06d16
IP: [] strnlen+0x6/0x20
Call Trace:
[] ? string+0x39/0xe0
[] ? __wake_up_common+0x4b/0x80
[] ? vsnprintf+0x1ec/0x380
[] ? seq_printf+0x2e/0x60
[] ? vmstat_show+0x26/0x30
[] ? seq_read+0xa6/0x380
[] ? seq_read+0x0/0x380
[] ? proc_reg_read+0x5f/0x90
[] ? vfs_read+0xa1/0x140
[] ? proc_reg_read+0x0/0x90
[] ? sys_read+0x41/0x70
[] ? sysenter_do_call+0x12/0x26

Reported-by: Tetsuo Handa
Cc: Michael Rubin
Signed-off-by: Wu Fengguang
Signed-off-by: Linus Torvalds

Wu Fengguang
2010-11-04 02:39:58 +0800

03 Nov, 2010

1 commit

d88c0922f Release page reference during page fault retry ... Browse Code »

This slipped by when unifying the filemap and swap versions of
lock_page_or_retry()...

Signed-off-by: Michel Lespinasse
Acked-by: Rik van Riel
Signed-off-by: Linus Torvalds

Michel Lespinasse
2010-11-03 05:02:31 +0800

30 Oct, 2010

1 commit

120a795da audit mmap ... Browse Code »

Normal syscall audit doesn't catch 5th argument of syscall. It also
doesn't catch the contents of userland structures pointed to be
syscall argument, so for both old and new mmap(2) ABI it doesn't
record the descriptor we are mapping. For old one it also misses
flags.

Signed-off-by: Al Viro

Al Viro
2010-10-30 20:45:43 +0800

29 Oct, 2010

2 commits

3c26ff6e4 convert get_sb_nodev() users ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2010-10-29 16:16:31 +0800
800416f79 numa: fix slab_node(MPOL_BIND) ... Browse Code »

When a node contains only HighMem memory, slab_node(MPOL_BIND)
dereferences a NULL pointer.

[ This code seems to go back all the way to commit 19770b32609b: "mm:
filter based on a nodemask as well as a gfp_mask". Which was back in
April 2008, and it got merged into 2.6.26. - Linus ]

Signed-off-by: Eric Dumazet
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: Lee Schermerhorn
Cc: Andrew Morton
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Eric Dumazet
2010-10-29 01:04:30 +0800

28 Oct, 2010

10 commits

bdab22501 Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-mn10300 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-mn10300: (44 commits)
MN10300: Save frame pointer in thread_info struct rather than global var
MN10300: Change "Matsushita" to "Panasonic".
MN10300: Create a defconfig for the ASB2364 board
MN10300: Update the ASB2303 defconfig
MN10300: ASB2364: Add support for SMSC911X and SMC911X
MN10300: ASB2364: Handle the IRQ multiplexer in the FPGA
MN10300: Generic time support
MN10300: Specify an ELF HWCAP flag for MN10300 Atomic Operations Unit support
MN10300: Map userspace atomic op regs as a vmalloc page
MN10300: And Panasonic AM34 subarch and implement SMP
MN10300: Delete idle_timestamp from irq_cpustat_t
MN10300: Make various interrupt priority settings configurable
MN10300: Optimise do_csum()
MN10300: Implement atomic ops using atomic ops unit
MN10300: Make the FPU operate in non-lazy mode under SMP
MN10300: SMP TLB flushing
MN10300: Use the [ID]PTEL2 registers rather than [ID]PTEL for TLB control
MN10300: Make the use of PIDR to mark TLB entries controllable
MN10300: Rename __flush_tlb*() to local_flush_tlb*()
MN10300: AM34 erratum requires MMUCTR read and write on exception entry
...

Linus Torvalds
2010-10-28 09:53:26 +0800
0be8557bc fuse: use release_pages() ... Browse Code »

Replace iterated page_cache_release() with release_pages(), which is
faster and shorter.

Needs release_pages() to be exported to modules.

Suggested-by: Andrew Morton
Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miklos Szeredi
2010-10-28 09:03:17 +0800
26174efd4 memcg: generic filestat update interface ... Browse Code »

This patch extracts the core logic from mem_cgroup_update_file_mapped() as
mem_cgroup_update_file_stat() and adds a wrapper.

As a planned future update, memory cgroup has to count dirty pages to
implement dirty_ratio/limit. And more, the number of dirty pages is
required to kick flusher thread to start writeback. (Now, no kick.)

This patch is preparation for it and makes other statistics implementation
clearer. Just a clean up.

Signed-off-by: KAMEZAWA Hiroyuki
Acked-by: Balbir Singh
Reviewed-by: Greg Thelen
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-10-28 09:03:10 +0800
1489ebad8 memcg: cpu hotplug aware quick acount_move detection ... Browse Code »

An event counter MEM_CGROUP_ON_MOVE is used for quick check whether file
stat update can be done in async manner or not. Now, it use percpu
counter and for_each_possible_cpu to update.

This patch replaces for_each_possible_cpu to for_each_online_cpu and adds
necessary synchronization logic at CPU HOTPLUG.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-10-28 09:03:09 +0800
711d3d2c9 memcg: cpu hotplug aware percpu count updates ... Browse Code »

Now, memcgroup's per cpu coutner uses for_each_possible_cpu() to get the
value. It's better to use for_each_online_cpu() and a cpu hotplug
handler.

This patch only handles statistics counter. MEM_CGROUP_ON_MOVE will be
handled in another patch.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-10-28 09:03:09 +0800
7d74b06f2 memcg: use for_each_mem_cgroup ... Browse Code »

In memory cgroup management, we sometimes have to walk through
subhierarchy of cgroup to gather informaiton, or lock something, etc.

Now, to do that, mem_cgroup_walk_tree() function is provided. It calls
given callback function per cgroup found. But the bad thing is that it
has to pass a fixed style function and argument, "void*" and it adds much
type casting to memcontrol.c.

To make the code clean, this patch replaces walk_tree() with

for_each_mem_cgroup_tree(iter, root)

An iterator style call. The good point is that iterator call doesn't have
to assume what kind of function is called under it. A bad point is that
it may cause reference-count leak if a caller use "break" from the loop by
mistake.

I think the benefit is larger. The modified code seems straigtforward and
easy to read because we don't have misterious callbacks and pointer cast.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-10-28 09:03:09 +0800
32047e2a8 memcg: avoid lock in updating file_mapped (Was fix race in file_mapped accouting flag management ... Browse Code »

At accounting file events per memory cgroup, we need to find memory cgroup
via page_cgroup->mem_cgroup. Now, we use lock_page_cgroup() for guarantee
pc->mem_cgroup is not overwritten while we make use of it.

But, considering the context which page-cgroup for files are accessed,
we can use alternative light-weight mutual execusion in the most case.

At handling file-caches, the only race we have to take care of is "moving"
account, IOW, overwriting page_cgroup->mem_cgroup. (See comment in the
patch)

Unlike charge/uncharge, "move" happens not so frequently. It happens only when
rmdir() and task-moving (with a special settings.)
This patch adds a race-checker for file-cache-status accounting v.s. account
moving. The new per-cpu-per-memcg counter MEM_CGROUP_ON_MOVE is added.
The routine for account move
1. Increment it before start moving
2. Call synchronize_rcu()
3. Decrement it after the end of moving.
By this, file-status-counting routine can check it needs to call
lock_page_cgroup(). In most case, I doesn't need to call it.

Following is a perf data of a process which mmap()/munmap 32MB of file cache
in a minute.

Before patch:
28.25% mmap mmap [.] main
22.64% mmap [kernel.kallsyms] [k] page_fault
9.96% mmap [kernel.kallsyms] [k] mem_cgroup_update_file_mapped
3.67% mmap [kernel.kallsyms] [k] filemap_fault
3.50% mmap [kernel.kallsyms] [k] unmap_vmas
2.99% mmap [kernel.kallsyms] [k] __do_fault
2.76% mmap [kernel.kallsyms] [k] find_get_page

After patch:
30.00% mmap mmap [.] main
23.78% mmap [kernel.kallsyms] [k] page_fault
5.52% mmap [kernel.kallsyms] [k] mem_cgroup_update_file_mapped
3.81% mmap [kernel.kallsyms] [k] unmap_vmas
3.26% mmap [kernel.kallsyms] [k] find_get_page
3.18% mmap [kernel.kallsyms] [k] __do_fault
3.03% mmap [kernel.kallsyms] [k] filemap_fault
2.40% mmap [kernel.kallsyms] [k] handle_mm_fault
2.40% mmap [kernel.kallsyms] [k] do_page_fault

This patch reduces memcg's cost to some extent.
(mem_cgroup_update_file_mapped is called by both of map/unmap)

Note: It seems some more improvements are required..but no idea.
maybe removing set/unset flag is required.

Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Daisuke Nishimura
Cc: Balbir Singh
Cc: Greg Thelen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-10-28 09:03:09 +0800
0c270f8f9 memcg: fix race in file_mapped accouting flag management ... Browse Code »

Presently memory cgroup accounts file-mapped by counter and flag. counter
is working in the same way with zone_stat but FileMapped flag only exists
in memcg (for helping move_account).

This flag can be updated wrongly in a case. Assume CPU0 and CPU1 and a
thread mapping a page on CPU0, another thread unmapping it on CPU1.

CPU0 CPU1
rmv rmap (mapcount 1->0)
add rmap (mapcount 0->1)
lock_page_cgroup()
memcg counter+1 (some delay)
set MAPPED FLAG.
unlock_page_cgroup()
lock_page_cgroup()
memcg counter-1
clear MAPPED flag

In the above sequence counter is properly updated but FLAG is not. This
means that representing a state by a flag which is maintained by counter
needs some special care.

To handle this, when clearing a flag, this patch check mapcount directly
and clear the flag only when mapcount == 0. (if mapcount >0, someone will
make it to zero later and flag will be cleared.)

Reverse case, dec-after-inc cannot be a problem because page_table_lock()
works well for it. (IOW, to make above sequence, 2 processes should touch
the same page at once with map/unmap.)

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: Greg Thelen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-10-28 09:03:09 +0800
a8e23a291 mm,x86: fix kmap_atomic_push vs ioremap_32.c ... Browse Code »

It appears i386 uses kmap_atomic infrastructure regardless of
CONFIG_HIGHMEM which results in a compile error when highmem is disabled.

Cure this by providing the needed few bits for both CONFIG_HIGHMEM and
CONFIG_X86_32.

Signed-off-by: Peter Zijlstra
Reported-by: Chris Wilson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2010-10-28 09:03:05 +0800
7c7fcf762 MN10300: Save frame pointer in thread_info struct rather than global var ... Browse Code »

Save the current exception frame pointer in the thread_info struct rather than
in a global variable as the latter makes SMP tricky, especially when preemption
is also enabled.

This also replaces __frame with current_frame() and rearranges header file
inclusions to make it all compile.

Signed-off-by: David Howells
Acked-by: Akira Takeuchi

David Howells
2010-10-28 00:29:01 +0800

27 Oct, 2010

18 commits

426e1f5ce Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
split invalidate_inodes()
fs: skip I_FREEING inodes in writeback_sb_inodes
fs: fold invalidate_list into invalidate_inodes
fs: do not drop inode_lock in dispose_list
fs: inode split IO and LRU lists
fs: switch bdev inode bdi's correctly
fs: fix buffer invalidation in invalidate_list
fsnotify: use dget_parent
smbfs: use dget_parent
exportfs: use dget_parent
fs: use RCU read side protection in d_validate
fs: clean up dentry lru modification
fs: split __shrink_dcache_sb
fs: improve DCACHE_REFERENCED usage
fs: use percpu counter for nr_dentry and nr_dentry_unused
fs: simplify __d_free
fs: take dcache_lock inside __d_path
fs: do not assign default i_ino in new_inode
fs: introduce a per-cpu last_ino allocator
new helper: ihold()
...

Linus Torvalds
2010-10-27 08:58:44 +0800
766f91641 kernel: remove PF_FLUSHER ... Browse Code »

PF_FLUSHER is only ever set, not tested, remove it.

Signed-off-by: Peter Zijlstra
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2010-10-27 07:52:15 +0800
3ecb01df3 use clear_page()/copy_page() in favor of memset()/memcpy() on whole pages ... Browse Code »

After all that's what they are intended for.

Signed-off-by: Jan Beulich
Cc: Miklos Szeredi
Cc: "Eric W. Biederman"
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Beulich
2010-10-27 07:52:13 +0800
732eacc05 replace nested max/min macros with {max,min}3 macro ... Browse Code »

Use the new {max,min}3 macros to save some cycles and bytes on the stack.
This patch substitutes trivial nested macros with their counterpart.

Signed-off-by: Hagen Paul Pfeifer
Cc: Joe Perches
Cc: Ingo Molnar
Cc: Hartley Sweeten
Cc: Russell King
Cc: Benjamin Herrenschmidt
Cc: Thomas Gleixner
Cc: Herbert Xu
Cc: Roland Dreier
Cc: Sean Hefty
Cc: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hagen Paul Pfeifer
2010-10-27 07:52:12 +0800
f3ab2636c mm: do_migrate_range: reduce list_empty() check ... Browse Code »

Simple code for reducing list_empty(&source) check.

Signed-off-by: Bob Liu
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Wu Fengguang
Cc: KOSAKI Motohiro
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bob Liu
2010-10-27 07:52:11 +0800
809c44497 mm: do_migrate_range: exit loop if not_managed is true ... Browse Code »

If not_managed is true all pages will be putback to lru, so break the loop
earlier to skip other pages isolate.

Signed-off-by: Bob Liu
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Wu Fengguang
Cc: KOSAKI Motohiro
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bob Liu
2010-10-27 07:52:11 +0800
f6a3607e5 mm: page_isolation: codeclean fix comment and rm unneeded val init ... Browse Code »

__test_page_isolated_in_pageblock() returns 1 if all pages in the range
are isolated, so fix the comment. Variable `pfn' will be initialised in
the following loop so remove it.

Signed-off-by: Bob Liu
Acked-by: KAMEZAWA Hiroyuki
Cc: Wu Fengguang
Cc: KOSAKI Motohiro
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bob Liu
2010-10-27 07:52:11 +0800
572438f9b mm: fix is_mem_section_removable() page_order BUG_ON check ... Browse Code »

page_order() is called by memory hotplug's user interface to check the
section is removable or not. (is_mem_section_removable())

It calls page_order() withoug holding zone->lock.
So, even if the caller does

if (PageBuddy(page))
ret = page_order(page) ...
The caller may hit BUG_ON().

For fixing this, there are 2 choices.
1. add zone->lock.
2. remove BUG_ON().

is_mem_section_removable() is used for some "advice" and doesn't need to
be 100% accurate. This is_removable() can be called via user program..
We don't want to take this important lock for long by user's request. So,
this patch removes BUG_ON().

Signed-off-by: KAMEZAWA Hiroyuki
Acked-by: Wu Fengguang
Acked-by: Michal Hocko
Acked-by: Mel Gorman
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-10-27 07:52:11 +0800
44e2aa937 mm/hugetlb.c: add missing spin_lock() to hugetlb_cow() ... Browse Code »

Add missing spin_lock() of the page_table_lock before an error return in
hugetlb_cow(). Callers of hugtelb_cow() expect it to be held upon return.

Signed-off-by: Dean Nelson
Cc: Mel Gorman
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dean Nelson
2010-10-27 07:52:11 +0800
70384dc6d mm: fix error reporting in move_pages() syscall ... Browse Code »

The vma returned by find_vma does not necessarily include the target
address. If this happens the code tries to follow a page outside of any
vma and returns ENOENT instead of EFAULT.

Signed-off-by: Gleb Natapov
Acked-by: Christoph Lameter
Cc: Minchan Kim
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gleb Natapov
2010-10-27 07:52:11 +0800
66d7dd518 /proc/swaps: support polling ... Browse Code »

System management wants to subscribe to changes in swap configuration.
Make /proc/swaps pollable like /proc/mounts.

[akpm@linux-foundation.org: document proc_poll_event]
Signed-off-by: Kay Sievers
Acked-by: Greg KH
Cc: Jonathan Corbet
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kay Sievers
2010-10-27 07:52:11 +0800
e1ca7788d mm: add vzalloc() and vzalloc_node() helpers ... Browse Code »

Add vzalloc() and vzalloc_node() to encapsulate the
vmalloc-then-memset-zero operation.

Use __GFP_ZERO to zero fill the allocated memory.

Signed-off-by: Dave Young
Cc: Christoph Lameter
Acked-by: Greg Ungerer
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Young
2010-10-27 07:52:10 +0800
7bbc0905e mm/memory_hotplug.c: make scan_lru_pages() static ... Browse Code »

Reported-by: KOSAKI Motohiro
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2010-10-27 07:52:10 +0800
36deb0be3 vmstat: include compaction.h when CONFIG_COMPACTION ... Browse Code »

This removes following warning from sparse:

mm/vmstat.c:466:5: warning: symbol 'fragmentation_index' was not declared. Should it be static?

[akpm@linux-foundation.org: move the include to top-of-file]
Signed-off-by: Namhyung Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
2010-10-27 07:52:10 +0800
e199b5d1f vmalloc: annotate lock context change on s_start/stop() ... Browse Code »

s_start() and s_stop() grab/release vmlist_lock but were missing proper
annotations. Add them.

Signed-off-by: Namhyung Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
2010-10-27 07:52:10 +0800
170168d0a vmalloc: rename temporary variable in __insert_vmap_area() ... Browse Code »

Rename redundant 'tmp' to fix following sparse warnings:

mm/vmalloc.c:296:34: warning: symbol 'tmp' shadows an earlier one
mm/vmalloc.c:293:24: originally declared here

Signed-off-by: Namhyung Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
2010-10-27 07:52:10 +0800
e574b5fd2 rmap: make anon_vma_chain_free() static ... Browse Code »

Make anon_vma_chain_free() static. It is called only in rmap.c and the
corresponding alloc function is already static.

Signed-off-by: Namhyung Kim
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
2010-10-27 07:52:10 +0800
e9a81a821 rmap: wrap page_check_address() using __cond_lock() ... Browse Code »

The page_check_address() conditionally grabs *@ptlp in case of returning
non-NULL. Rename and wrap it using __cond_lock() removes following
warnings from sparse:

mm/rmap.c:472:9: warning: context imbalance in 'page_mapped_in_vma' - unexpected unlock
mm/rmap.c:524:9: warning: context imbalance in 'page_referenced_one' - unexpected unlock
mm/rmap.c:706:9: warning: context imbalance in 'page_mkclean_one' - unexpected unlock
mm/rmap.c:1066:9: warning: context imbalance in 'try_to_unmap_one' - unexpected unlock

Signed-off-by: Namhyung Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
2010-10-27 07:52:09 +0800