Doug / smarc-fsl-linux-kernel | Embedian Git Server

11 Dec, 2008

5 commits

9c2462472 KSYM_SYMBOL_LEN fixes ... Browse Code »

Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked
to my 966c8c12dc9e77f931e2281ba25d2f0244b06949 sprint_symbol(): use
less stack exposing a bug in slub's list_locations() -
kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was
beyond the end of page provided.

The 100 slop which list_locations() allows at end of page looks roughly
enough for all the other stuff it might print after the symbol before
it checks again: break out KSYM_SYMBOL_LEN earlier than before.

Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they
need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer
where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies
them.

[akpm@linux-foundation.org: ftrace.h needs module.h]
Signed-off-by: Hugh Dickins
Cc: Christoph Lameter
Cc Miles Lane
Acked-by: Pekka Enberg
Acked-by: Steven Rostedt
Acked-by: Frederic Weisbecker
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-12-11 00:01:54 +0800
80bba1290 mm: no get_user/put_user while holding mmap_sem in do_pages_stat? ... Browse Code »

Since commit 2f007e74bb85b9fc4eab28524052161703300f1a, do_pages_stat()
gets the page address from user-space and puts the corresponding status
back while holding the mmap_sem for read. There is no need to hold
mmap_sem there while some page-faults may occur.

This patch adds a temporary address and status buffer so as to only
hold mmap_sem while working on these kernel buffers. This is
implemented by extracting do_pages_stat_array() out of do_pages_stat().

Signed-off-by: Brice Goglin
Cc: Christoph Lameter
Cc: KAMEZAWA Hiroyuki
Cc: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Brice Goglin
2008-12-11 00:01:53 +0800
653d22c0f page_cgroup should ignore empty nodes ... Browse Code »

Fix a total bootup freeze on ia64.

Signed-off-by: KAMEZAWA Hiroyuki
Tested-by: Li Zefan
Reported-by: Li Zefan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-12-11 00:01:53 +0800
6841c8e26 mm: remove UP version of lru_add_drain_all() ... Browse Code »

Currently, lru_add_drain_all() has two version.
(1) use schedule_on_each_cpu()
(2) don't use schedule_on_each_cpu()

Gerald Schaefer reported it doesn't work well on SMP (not NUMA) S390
machine.

offline_pages() calls lru_add_drain_all() followed by drain_all_pages().
While drain_all_pages() works on each cpu, lru_add_drain_all() only runs
on the current cpu for architectures w/o CONFIG_NUMA. This let us run
into the BUG_ON(!PageBuddy(page)) in __offline_isolated_pages() during
memory hotplug stress test on s390. The page in question was still on the
pcp list, because of a race with lru_add_drain_all() and drain_all_pages()
on different cpus.

Actually, Almost machine has CONFIG_UNEVICTABLE_LRU=y. Then almost machine use
(1) version lru_add_drain_all although the machine is UP.

Then this ifdef is not valueable.
simple removing is better.

Signed-off-by: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: Lee Schermerhorn
Acked-by: Gerald Schaefer
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2008-12-11 00:01:53 +0800
69fc208be mm/backing-dev.c: remove recently-added WARN_ON() ... Browse Code »

On second thoughts, this is just going to disturb people while telling us
things which we already knew.

Cc: Peter Korsgaard
Cc: Peter Zijlstra
Cc: Kay Sievers
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2008-12-11 00:01:52 +0800

03 Dec, 2008

2 commits

9ff473b9a vmscan: evict streaming IO first ... Browse Code »

Count the insertion of new pages in the statistics used to drive the
pageout scanning code. This should help the kernel quickly evict
streaming file IO.

We count on the fact that new file pages start on the inactive file LRU
and new anonymous pages start on the active anon list. This means
streaming file IO will increment the recent scanned file statistic, while
leaving the recent rotated file statistic alone, driving pageout scanning
to the file LRUs.

Pageout activity does its own list manipulation.

Signed-off-by: Rik van Riel
Cc: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Tested-by: Gene Heskett
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2008-12-03 07:50:40 +0800
f1d0b063d bdi: register sysfs bdi device only once per queue ... Browse Code »

Devices which share the same queue, like floppies and mtd devices, get
registered multiple times in the bdi interface, but bdi accounts only the
last registered device of the devices sharing one queue.

On remove, all earlier registered devices leak, stay around in sysfs, and
cause "duplicate filename" errors if the devices are re-created.

This prevents the creation of multiple bdi interfaces per queue, and the
bdi device will carry the dev_t name of the block device which is the
first one registered, of the pool of devices using the same queue.

[akpm@linux-foundation.org: add a WARN_ON so we know which drivers are misbehaving]
Tested-by: Peter Korsgaard
Acked-by: Peter Zijlstra
Signed-off-by: Kay Sievers
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kay Sievers
2008-12-03 07:50:40 +0800

02 Dec, 2008

2 commits

dc19f9db3 memcg: memory hotplug fix for notifier callback ... Browse Code »

Fixes for memcg/memory hotplug.

While memory hotplug allocate/free memmap, page_cgroup doesn't free
page_cgroup at OFFLINE when page_cgroup is allocated via bootomem.
(Because freeing bootmem requires special care.)

Then, if page_cgroup is allocated by bootmem and memmap is freed/allocated
by memory hotplug, page_cgroup->page == page is no longer true.

But current MEM_ONLINE handler doesn't check it and update
page_cgroup->page if it's not necessary to allocate page_cgroup. (This
was not found because memmap is not freed if SPARSEMEM_VMEMMAP is y.)

And I noticed that MEM_ONLINE can be called against "part of section".
So, freeing page_cgroup at CANCEL_ONLINE will cause trouble. (freeing
used page_cgroup) Don't rollback at CANCEL.

One more, current memory hotplug notifier is stopped by slub because it
sets NOTIFY_STOP_MASK to return vaule. So, page_cgroup's callback never
be called. (low priority than slub now.)

I think this slub's behavior is not intentional(BUG). and fixes it.

Another way to be considered about page_cgroup allocation:
- free page_cgroup at OFFLINE even if it's from bootmem
and remove specieal handler. But it requires more changes.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12041

Signed-off-by: KAMEZAWA Hiruyoki
Cc: Li Zefan
Cc: Balbir Singh
Cc: Pavel Emelyanov
Tested-by: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-12-02 11:55:24 +0800
b29acbdcf mm: vmalloc fix lazy unmapping cache aliasing ... Browse Code »

Jim Radford has reported that the vmap subsystem rewrite was sometimes
causing his VIVT ARM system to behave strangely (seemed like going into
infinite loops trying to fault in pages to userspace).

We determined that the problem was most likely due to a cache aliasing
issue. flush_cache_vunmap was only being called at the moment the page
tables were to be taken down, however with lazy unmapping, this can happen
after the page has subsequently been freed and allocated for something
else. The dangling alias may still have dirty data attached to it.

The fix for this problem is to do the cache flushing when the caller has
called vunmap -- it would be a bug for them to write anything else to the
mapping at that point.

That appeared to solve Jim's problems.

Reported-by: Jim Radford
Signed-off-by: Nick Piggin
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2008-12-02 11:55:23 +0800

01 Dec, 2008

2 commits

2a1dc5097 vmscan: protect zone rotation stats by lru lock ... Browse Code »

The zone's rotation statistics must not be accessed without the
corresponding LRU lock held. Fix an unprotected write in
shrink_active_list().

Acked-by: Rik van Riel
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Johannes Weiner
Signed-off-by: Linus Torvalds

Johannes Weiner
2008-12-01 23:58:06 +0800
31168481c meminit section warnings ... Browse Code »

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2008-12-01 02:03:35 +0800

20 Nov, 2008

7 commits

00d8089c5 vmscan: fix get_scan_ratio() comment ... Browse Code »

Fix the old comment on the scan ratio calculations.

Signed-off-by: Rik van Riel
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2008-11-20 10:49:59 +0800
63eb6b93c vmscan: let GFP_NOFS go to swap again ... Browse Code »

In the past, GFP_NOFS (but of course not GFP_NOIO) was allowed to reclaim
by writing to swap. That got partially broken in 2.6.23, when may_enter_fs
initialization was moved up before the allocation of swap, so its
PageSwapCache test was failing the first time around,

Fix it by setting may_enter_fs when add_to_swap() succeeds with
__GFP_IO. In fact, check __GFP_IO before calling add_to_swap():
allocating swap we're not ready to use just increases disk seeking.

Signed-off-by: Hugh Dickins
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-11-20 10:49:59 +0800
bda8550de migration: fix writepage error ... Browse Code »

Page migration's writeout() has got understandably confused by the nasty
AOP_WRITEPAGE_ACTIVATE case: as in normal success, a writepage() error has
unlocked the page, so writeout() then needs to relock it.

Signed-off-by: Hugh Dickins
Cc: KAMEZAWA Hiroyuki
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-11-20 10:49:58 +0800
0ae15132a mm: vmalloc search restart fix ... Browse Code »

Current vmalloc restart search for a free area in case we can't find one.
The reason is there are areas which are lazily freed, and could be
possibly freed now. However, current implementation start searching the
tree from the last failing address, which is pretty much by definition at
the end of address space. So, we fail.

The proposal of this patch is to restart the search from the beginning of
the requested vstart address. This fixes the regression in running KVM
virtual machines for me, described in http://lkml.org/lkml/2008/10/28/349,
caused by commit db64fe02258f1507e13fe5212a989922323685ce.

Signed-off-by: Glauber Costa
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Glauber Costa
2008-11-20 10:49:58 +0800
496850e5f mm: vmalloc failure flush fix ... Browse Code »

An initial vmalloc failure should start off a synchronous flush of lazy
areas, in case someone is in progress flushing them already, which could
cause us to return an allocation failure even if there is plenty of KVA
free.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2008-11-20 10:49:58 +0800
f011c2dae mm: vmalloc allocator off by one ... Browse Code »

Fix off by one bug in the KVA allocator that can leave gaps in the address
space.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2008-11-20 10:49:58 +0800
f481891fd cpuset: update top cpuset's mems after adding a node ... Browse Code »

After adding a node into the machine, top cpuset's mems isn't updated.

By reviewing the code, we found that the update function

cpuset_track_online_nodes()

was invoked after node_states[N_ONLINE] changes. It is wrong because
N_ONLINE just means node has pgdat, and if node has/added memory, we use
N_HIGH_MEMORY. So, We should invoke the update function after
node_states[N_HIGH_MEMORY] changes, just like its commit says.

This patch fixes it. And we use notifier of memory hotplug instead of
direct calling of cpuset_track_online_nodes().

Signed-off-by: Miao Xie
Acked-by: Yasunori Goto
Cc: David Rientjes
Cc: Paul Menage
Signed-off-by: Linus Torvalds

Miao Xie
2008-11-20 10:49:58 +0800

17 Nov, 2008

1 commit

72eb8c674 unitialized return value in mm/mlock.c: __mlock_vma_pages_range() ... Browse Code »

Fix an unitialized return value when compiling on parisc (with CONFIG_UNEVICTABLE_LRU=y):
mm/mlock.c: In function `__mlock_vma_pages_range':
mm/mlock.c:165: warning: `ret' might be used uninitialized in this function

Signed-off-by: Helge Deller
[ It isn't ever really used uninitialized, since no caller should ever
call this function with an empty range. But the compiler is correct
that from a local analysis standpoint that is impossible to see, and
fixing the warning is appropriate. ]
Signed-off-by: Linus Torvalds

Helge Deller
2008-11-17 07:55:36 +0800

16 Nov, 2008

1 commit

748f1a2ed mm: remove unevictable's show_page_path ... Browse Code »

Hugh Dickins reported show_page_path() is buggy and unsafe because

- lack dput() against d_find_alias()
- don't concern vma->vm_mm->owner == NULL
- lack lock_page()

it was only for debugging, so rather than trying to fix it, just remove
it now.

Reported-by: Hugh Dickins
Signed-off-by: Hugh Dickins
Signed-off-by: KOSAKI Motohiro
CC: Lee Schermerhorn
CC: Rik van Riel
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2008-11-16 03:36:07 +0800

13 Nov, 2008

5 commits

33c5d3d64 memcg: bugfix for memory hotplug ... Browse Code »

The start pfn calculation in page_cgroup's memory hotplug notifier chain
is wrong.

Tested-by: Badari Pulavarty
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-11-13 09:17:17 +0800
8891d6da1 mm: remove lru_add_drain_all() from the munlock path ... Browse Code »

lockdep warns about following message at boot time on one of my test
machine. Then, schedule_on_each_cpu() sholdn't be called when the task
have mmap_sem.

Actually, lru_add_drain_all() exist to prevent the unevictalble pages
stay on reclaimable lru list. but currenct unevictable code can rescue
unevictable pages although it stay on reclaimable list.

So removing is better.

In addition, this patch add lru_add_drain_all() to sys_mlock() and
sys_mlockall(). it isn't must. but it reduce the failure of moving to
unevictable list. its failure can rescue in vmscan later. but reducing
is better.

Note, if above rescuing happend, the Mlocked and the Unevictable field
mismatching happend in /proc/meminfo. but it doesn't cause any real
trouble.

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.28-rc2-mm1 #2
-------------------------------------------------------
lvm/1103 is trying to acquire lock:
(&cpu_hotplug.lock){--..}, at: [] get_online_cpus+0x29/0x50

but task is already holding lock:
(&mm->mmap_sem){----}, at: [] sys_mlockall+0x4e/0xb0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&mm->mmap_sem){----}:
[] check_noncircular+0x82/0x110
[] might_fault+0x4a/0xa0
[] validate_chain+0xb11/0x1070
[] might_fault+0x4a/0xa0
[] __lock_acquire+0x263/0xa10
[] lock_acquire+0x7c/0xb0 (*) grab mmap_sem
[] might_fault+0x4a/0xa0
[] might_fault+0x7b/0xa0
[] might_fault+0x4a/0xa0
[] copy_to_user+0x30/0x60
[] filldir+0x7c/0xd0
[] sysfs_readdir+0x11a/0x1f0 (*) grab sysfs_mutex
[] filldir+0x0/0xd0
[] filldir+0x0/0xd0
[] vfs_readdir+0x86/0xa0 (*) grab i_mutex
[] sys_getdents+0x6b/0xc0
[] syscall_call+0x7/0xb
[] 0xffffffff

-> #2 (sysfs_mutex){--..}:
[] check_noncircular+0x82/0x110
[] sysfs_addrm_start+0x2c/0xc0
[] validate_chain+0xb11/0x1070
[] sysfs_addrm_start+0x2c/0xc0
[] __lock_acquire+0x263/0xa10
[] lock_acquire+0x7c/0xb0 (*) grab sysfs_mutex
[] sysfs_addrm_start+0x2c/0xc0
[] mutex_lock_nested+0xa5/0x2f0
[] sysfs_addrm_start+0x2c/0xc0
[] sysfs_addrm_start+0x2c/0xc0
[] sysfs_addrm_start+0x2c/0xc0
[] create_dir+0x3f/0x90
[] sysfs_create_dir+0x29/0x50
[] _spin_unlock+0x25/0x40
[] kobject_add_internal+0xcd/0x1a0
[] kobject_set_name_vargs+0x3a/0x50
[] kobject_init_and_add+0x2d/0x40
[] sysfs_slab_add+0xd2/0x180
[] sysfs_add_func+0x0/0x70
[] sysfs_add_func+0x5c/0x70 (*) grab slub_lock
[] run_workqueue+0x172/0x200
[] run_workqueue+0x10f/0x200
[] worker_thread+0x0/0xf0
[] worker_thread+0x9c/0xf0
[] autoremove_wake_function+0x0/0x50
[] worker_thread+0x0/0xf0
[] kthread+0x42/0x70
[] kthread+0x0/0x70
[] kernel_thread_helper+0x7/0x1c
[] 0xffffffff

-> #1 (slub_lock){----}:
[] check_noncircular+0xd/0x110
[] slab_cpuup_callback+0x11f/0x1d0
[] validate_chain+0xb11/0x1070
[] slab_cpuup_callback+0x11f/0x1d0
[] mark_lock+0x35d/0xd00
[] __lock_acquire+0x263/0xa10
[] lock_acquire+0x7c/0xb0
[] slab_cpuup_callback+0x11f/0x1d0
[] down_read+0x43/0x80
[] slab_cpuup_callback+0x11f/0x1d0 (*) grab slub_lock
[] slab_cpuup_callback+0x11f/0x1d0
[] notifier_call_chain+0x3c/0x70
[] _cpu_up+0x84/0x110
[] cpu_up+0x4b/0x70 (*) grab cpu_hotplug.lock
[] kernel_init+0x0/0x170
[] kernel_init+0xb5/0x170
[] kernel_init+0x0/0x170
[] kernel_thread_helper+0x7/0x1c
[] 0xffffffff

-> #0 (&cpu_hotplug.lock){--..}:
[] validate_chain+0x5af/0x1070
[] dev_status+0x0/0x50
[] __lock_acquire+0x263/0xa10
[] lock_acquire+0x7c/0xb0
[] get_online_cpus+0x29/0x50
[] mutex_lock_nested+0xa5/0x2f0
[] get_online_cpus+0x29/0x50
[] get_online_cpus+0x29/0x50
[] lru_add_drain_per_cpu+0x0/0x10
[] get_online_cpus+0x29/0x50 (*) grab cpu_hotplug.lock
[] schedule_on_each_cpu+0x32/0xe0
[] __mlock_vma_pages_range+0x85/0x2c0
[] __lock_acquire+0x285/0xa10
[] vma_merge+0xa9/0x1d0
[] mlock_fixup+0x180/0x200
[] do_mlockall+0x78/0x90 (*) grab mmap_sem
[] sys_mlockall+0x81/0xb0
[] syscall_call+0x7/0xb
[] 0xffffffff

other info that might help us debug this:

1 lock held by lvm/1103:
#0: (&mm->mmap_sem){----}, at: [] sys_mlockall+0x4e/0xb0

stack backtrace:
Pid: 1103, comm: lvm Not tainted 2.6.28-rc2-mm1 #2
Call Trace:
[] print_circular_bug_tail+0x7c/0xd0
[] validate_chain+0x5af/0x1070
[] dev_status+0x0/0x50
[] __lock_acquire+0x263/0xa10
[] lock_acquire+0x7c/0xb0
[] get_online_cpus+0x29/0x50
[] mutex_lock_nested+0xa5/0x2f0
[] get_online_cpus+0x29/0x50
[] get_online_cpus+0x29/0x50
[] lru_add_drain_per_cpu+0x0/0x10
[] get_online_cpus+0x29/0x50
[] schedule_on_each_cpu+0x32/0xe0
[] __mlock_vma_pages_range+0x85/0x2c0
[] __lock_acquire+0x285/0xa10
[] vma_merge+0xa9/0x1d0
[] mlock_fixup+0x180/0x200
[] do_mlockall+0x78/0x90
[] sys_mlockall+0x81/0xb0
[] syscall_call+0x7/0xb

Signed-off-by: KOSAKI Motohiro
Tested-by: Kamalesh Babulal
Cc: Lee Schermerhorn
Cc: Christoph Lameter
Cc: Heiko Carstens
Cc: Nick Piggin
Cc: Hugh Dickins
Cc: Rik van Riel
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2008-11-13 09:17:16 +0800
e33c3b5e1 cpusets: update mems allowed in page allocator ... Browse Code »

If all allowable memory is unreclaimable, it is possible to loop forever
in the page allocator for ~__GFP_NORETRY allocations.

During this time, it is also possible for a task's cpuset to expand its
set of allowable nodes so that it now includes free memory. The cached
copy of this set, current->mems_allowed, is stale, however, since there
has not been a subsequent call to cpuset_update_task_memory_state().

The cached copy of the set of allowable nodes is now updated in the page
allocator's slow path so the additional memory is available to
get_page_from_freelist().

[akpm@linux-foundation.org: add comment]
Signed-off-by: David Rientjes
Cc: Paul Menage
Cc: Christoph Lameter
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2008-11-13 09:17:16 +0800
7526674de hugetlb: make unmap_ref_private multi-size-aware ... Browse Code »

Oops. Part of the hugetlb private reservation code was not fully
converted to use hstates.

When a huge page must be unmapped from VMAs due to a failed COW,
HPAGE_SIZE is used in the call to unmap_hugepage_range() regardless of
the page size being used. This works if the VMA is using the default
huge page size. Otherwise we might unmap too much, too little, or
trigger a BUG_ON. Rare but serious -- fix it.

Signed-off-by: Adam Litke
Cc: Jon Tollefson
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adam Litke
2008-11-13 09:17:16 +0800
1c1271850 parisc: fix find_extend_vma() breakage ... Browse Code »

The STACK_GROWSUP case of stack expansion was missing a test for 'prev',
which got removed by commit cb8f488c33539f096580e202f5438a809195008f
("mmap.c: deinline a few functions") by mistake.

I found my original email in "sent" folder. The patch in that mail
does NOT remove !prev. That change had beed added by someone else.

Ok, I think we are not much interested in who did it, let's
fix it for good.

[ "It looks like this was caused by me fixing rejects. That was the
fancy include-lots-of-context-so-it-wont-apply patch." - akpm ]

Reported-and-bisected-by: Helge Deller
Signed-off-by: Denys Vlasenko
Cc: Andrew Morton
Cc: Jiri Kosina
Signed-off-by: Linus Torvalds

Denys Vlasenko
2008-11-13 02:37:48 +0800

07 Nov, 2008

10 commits

9b4633340 vmap: cope with vm_unmap_aliases before vmalloc_init() ... Browse Code »

Xen can end up calling vm_unmap_aliases() before vmalloc_init() has
been called. In this case its safe to make it a simple no-op.

Signed-off-by: Jeremy Fitzhardinge
Cc: Linux Memory Management List
Cc: Nick Piggin
Signed-off-by: Ingo Molnar

Jeremy Fitzhardinge
2008-11-07 17:05:59 +0800
9144f3821 Merge master.kernel.org:/home/rmk/linux-2.6-arm ... Browse Code »

* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] xsc3: fix xsc3_l2_inv_range
[ARM] mm: fix page table initialization
[ARM] fix naming of MODULE_START / MODULE_END
ARM: OMAP: Fix define for twl4030 irqs
ARM: OMAP: Fix get_irqnr_and_base to clear spurious interrupt bits
ARM: OMAP: Fix debugfs_create_*'s error checking method for arm/plat-omap
ARM: OMAP: Fix compiler warnings in gpmc.c
[ARM] fix VFP+softfloat binaries

Linus Torvalds
2008-11-07 07:56:29 +0800
a70dcb969 memory hotplug: fix page_zone() calculation in test_pages_isolated() ... Browse Code »

My last bugfix here (adding zone->lock) introduced a new problem: Using
page_zone(pfn_to_page(pfn)) to get the zone after the for() loop is wrong.
pfn will then be >= end_pfn, which may be in a different zone or not
present at all. This may lead to an addressing exception in page_zone()
or spin_lock_irqsave().

Now I use __first_valid_page() again after the loop to find a valid page
for page_zone().

Signed-off-by: Gerald Schaefer
Acked-by: Nathan Fontenot
Reviewed-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gerald Schaefer
2008-11-07 07:41:19 +0800
fbdd12676 mm/oom_kill.c: fix badness() kerneldoc ... Browse Code »

Paramter @mem has been removed since v2.6.26, now delete it's comment.

Signed-off-by: Qinghuang Feng
Acked-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Qinghuang Feng
2008-11-07 07:41:19 +0800
b41ad14c3 vmemmap: warn about page_structs with remote distance ... Browse Code »

It's insufficient to simply compare node ids when warning about offnode
page_structs since it's possible to still have local affinity.

Acked-by: Christoph Lameter
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2008-11-07 07:41:19 +0800
0aedadf91 mm: move migrate_prep out from under mmap_sem ... Browse Code »

Move the migrate_prep outside the mmap_sem for the following system calls

1. sys_move_pages
2. sys_migrate_pages
3. sys_mbind()

It really does not matter when we flush the lru. The system is free to
add pages onto the lru even during migration which will make the page
migration either skip the page (mbind, migrate_pages) or return a busy
state (move_pages).

Fixes this lockdep warning (and potential deadlock):

Some VM place has
mmap_sem -> kevent_wq via lru_add_drain_all()

net/core/dev.c::dev_ioctl() has
rtnl_lock -> mmap_sem (*) the ioctl has copy_from_user() and it can do page fault.

linkwatch_event has
kevent_wq -> rtnl_lock

Signed-off-by: Christoph Lameter
Cc: KOSAKI Motohiro
Reported-by: Heiko Carstens
Cc: Nick Piggin
Cc: Hugh Dickins
Cc: Rik van Riel
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2008-11-07 07:41:18 +0800
b4416d2be oom: do not dump task state for non thread group leaders ... Browse Code »

When /proc/sys/vm/oom_dump_tasks is enabled, it's only necessary to dump
task state information for thread group leaders. The kernel log gets
quickly overwhelmed on machines with a massive number of threads by
dumping non-thread group leaders.

Reviewed-by: Christoph Lameter
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2008-11-07 07:41:18 +0800
18229df5b hugetlb: pull gigantic page initialisation out of the default path ... Browse Code »

As we can determine exactly when a gigantic page is in use we can optimise
the common regular page cases by pulling out gigantic page initialisation
into its own function. As gigantic pages are never released to buddy we
do not need a destructor. This effectivly reverts the previous change to
the main buddy allocator. It also adds a paranoid check to ensure we
never release gigantic pages from hugetlbfs to the main buddy.

Signed-off-by: Andy Whitcroft
Cc: Jon Tollefson
Cc: Mel Gorman
Cc: Nick Piggin
Cc: Christoph Lameter
Cc: [2.6.27.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Whitcroft
2008-11-07 07:41:18 +0800
69d177c2f hugetlbfs: handle pages higher order than MAX_ORDER ... Browse Code »

When working with hugepages, hugetlbfs assumes that those hugepages are
smaller than MAX_ORDER. Specifically it assumes that the mem_map is
contigious and uses that to optimise access to the elements of the mem_map
that represent the hugepage. Gigantic pages (such as 16GB pages on
powerpc) by definition are of greater order than MAX_ORDER (larger than
MAX_ORDER_NR_PAGES in size). This means that we can no longer make use of
the buddy alloctor guarentees for the contiguity of the mem_map, which
ensures that the mem_map is at least contigious for maximmally aligned
areas of MAX_ORDER_NR_PAGES pages.

This patch adds new mem_map accessors and iterator helpers which handle
any discontiguity at MAX_ORDER_NR_PAGES boundaries. It then uses these to
implement gigantic page versions of copy_huge_page and clear_huge_page,
and to allow follow_hugetlb_page handle gigantic pages.

Signed-off-by: Andy Whitcroft
Cc: Jon Tollefson
Cc: Mel Gorman
Cc: Nick Piggin
Cc: Christoph Lameter
Cc: [2.6.27.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Whitcroft
2008-11-07 07:41:18 +0800
ab4f2ee13 [ARM] fix naming of MODULE_START / MODULE_END ... Browse Code »

As of 73bdf0a60e607f4b8ecc5aec597105976565a84f, the kernel needs
to know where modules are located in the virtual address space.
On ARM, we located this region between MODULE_START and MODULE_END.
Unfortunately, everyone else calls it MODULES_VADDR and MODULES_END.
Update ARM to use the same naming, so is_vmalloc_or_module_addr()
can work properly. Also update the comment on mm/vmalloc.c to
reflect that ARM also places modules in a separate region from the
vmalloc space.

Signed-off-by: Russell King

Russell King
2008-11-07 01:13:47 +0800

31 Oct, 2008

3 commits

731572d39 nfsd: fix vm overcommit crash ... Browse Code »

Junjiro R. Okajima reported a problem where knfsd crashes if you are
using it to export shmemfs objects and run strict overcommit. In this
situation the current->mm based modifier to the overcommit goes through a
NULL pointer.

We could simply check for NULL and skip the modifier but we've caught
other real bugs in the past from mm being NULL here - cases where we did
need a valid mm set up (eg the exec bug about a year ago).

To preserve the checks and get the logic we want shuffle the checking
around and add a new helper to the vm_ security wrappers

Also fix a current->mm reference in nommu that should use the passed mm

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix build]
Reported-by: Junjiro R. Okajima
Acked-by: James Morris
Signed-off-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alan Cox
2008-10-31 02:38:47 +0800
e99c97ade mm: fix kernel-doc function notation ... Browse Code »

Delete excess kernel-doc notation in mm/ subdirectory.
Actually this is a kernel-doc notation fix.

Warning(/var/linsrc/linux-2.6.27-git10//mm/vmalloc.c:902): Excess function parameter or struct member 'returns' description in 'vm_map_ram'

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2008-10-31 02:38:46 +0800
4e02ed4b4 fs: remove prepare_write/commit_write ... Browse Code »

Nothing uses prepare_write or commit_write. Remove them from the tree
completely.

[akpm@linux-foundation.org: schedule simple_prepare_write() for unexporting]
Signed-off-by: Nick Piggin
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2008-10-31 02:38:45 +0800

24 Oct, 2008

1 commit

88ed86fee Merge branch 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc ... Browse Code »

* 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc: (35 commits)
proc: remove fs/proc/proc_misc.c
proc: move /proc/vmcore creation to fs/proc/vmcore.c
proc: move pagecount stuff to fs/proc/page.c
proc: move all /proc/kcore stuff to fs/proc/kcore.c
proc: move /proc/schedstat boilerplate to kernel/sched_stats.h
proc: move /proc/modules boilerplate to kernel/module.c
proc: move /proc/diskstats boilerplate to block/genhd.c
proc: move /proc/zoneinfo boilerplate to mm/vmstat.c
proc: move /proc/vmstat boilerplate to mm/vmstat.c
proc: move /proc/pagetypeinfo boilerplate to mm/vmstat.c
proc: move /proc/buddyinfo boilerplate to mm/vmstat.c
proc: move /proc/vmallocinfo to mm/vmalloc.c
proc: move /proc/slabinfo boilerplate to mm/slub.c, mm/slab.c
proc: move /proc/slab_allocators boilerplate to mm/slab.c
proc: move /proc/interrupts boilerplate code to fs/proc/interrupts.c
proc: move /proc/stat to fs/proc/stat.c
proc: move rest of /proc/partitions code to block/genhd.c
proc: move /proc/cpuinfo code to fs/proc/cpuinfo.c
proc: move /proc/devices code to fs/proc/devices.c
proc: move rest of /proc/locks to fs/locks.c
...

Linus Torvalds
2008-10-24 03:04:37 +0800

23 Oct, 2008

1 commit

94b6da5ab memcg: fix page_cgroup allocation ... Browse Code »

page_cgroup_init() is called from mem_cgroup_init(). But at this
point, we cannot call alloc_bootmem().
(and this caused panic at boot.)

This patch moves page_cgroup_init() to init/main.c.

Time table is following:
==
parse_args(). # we can trust mem_cgroup_subsys.disabled bit after this.
....
cgroup_init_early() # "early" init of cgroup.
....
setup_arch() # memmap is allocated.
...
page_cgroup_init();
mem_init(); # we cannot call alloc_bootmem after this.
....
cgroup_init() # mem_cgroup is initialized.
==

Before page_cgroup_init(), mem_map must be initialized. So,
I added page_cgroup_init() to init/main.c directly.

(*) maybe this is not very clean but
- cgroup_init_early() is too early
- in cgroup_init(), we have to use vmalloc instead of alloc_bootmem().
use of vmalloc area in x86-32 is important and we should avoid very large
vmalloc() in x86-32. So, we want to use alloc_bootmem() and added page_cgroup_init()
directly to init/main.c

[akpm@linux-foundation.org: remove unneeded/bad mem_cgroup_subsys declaration]
[akpm@linux-foundation.org: fix build]
Acked-by: Balbir Singh
Tested-by: Balbir Singh
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-10-23 23:55:02 +0800