Eric Lee / smarc-fsl-linux-kernel

29 May, 2009

4 commits

46f7e602f memcg: fix build warning and avoid checking for mem != null again and again ... Browse Code »

Fix build warning, "mem_cgroup_is_obsolete defined but not used" when
CONFIG_DEBUG_VM is not set. Also avoid checking for !mem again and again.

Signed-off-by: Nikanth Karthikesan
Acked-by: Pekka Enberg
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nikanth Karthikesan
2009-05-29 23:40:03 +0800
f83a275db mm: account for MAP_SHARED mappings using VM_MAYSHARE and not VM_SHARED in hugetlbfs ... Browse Code »

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13302

hugetlbfs reserves huge pages but does not fault them at mmap() time to
ensure that future faults succeed. The reservation behaviour differs
depending on whether the mapping was mapped MAP_SHARED or MAP_PRIVATE.
For MAP_SHARED mappings, hugepages are reserved when mmap() is first
called and are tracked based on information associated with the inode.
Other processes mapping MAP_SHARED use the same reservation. MAP_PRIVATE
track the reservations based on the VMA created as part of the mmap()
operation. Each process mapping MAP_PRIVATE must make its own
reservation.

hugetlbfs currently checks if a VMA is MAP_SHARED with the VM_SHARED flag
and not VM_MAYSHARE. For file-backed mappings, such as hugetlbfs,
VM_SHARED is set only if the mapping is MAP_SHARED and the file was opened
read-write. If a shared memory mapping was mapped shared-read-write for
populating of data and mapped shared-read-only by other processes, then
hugetlbfs would account for the mapping as if it was MAP_PRIVATE. This
causes processes to fail to map the file MAP_SHARED even though it should
succeed as the reservation is there.

This patch alters mm/hugetlb.c and replaces VM_SHARED with VM_MAYSHARE
when the intent of the code was to check whether the VMA was mapped
MAP_SHARED or MAP_PRIVATE.

Signed-off-by: Mel Gorman
Cc: Hugh Dickins
Cc: Ingo Molnar
Cc:
Cc: Lee Schermerhorn
Cc: KOSAKI Motohiro
Cc:
Cc: Eric B Munson
Cc: Adam Litke
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2009-05-29 23:40:03 +0800
e767e0561 memcg: fix deadlock between lock_page_cgroup and mapping tree_lock ... Browse Code »

mapping->tree_lock can be acquired from interrupt context. Then,
following dead lock can occur.

Assume "A" as a page.

CPU0:
lock_page_cgroup(A)
interrupted
-> take mapping->tree_lock.
CPU1:
take mapping->tree_lock
-> lock_page_cgroup(A)

This patch tries to fix above deadlock by moving memcg's hook to out of
mapping->tree_lock. charge/uncharge of pagecache/swapcache is protected
by page lock, not tree_lock.

After this patch, lock_page_cgroup() is not called under mapping->tree_lock.

Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Daisuke Nishimura
Cc: Balbir Singh
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke Nishimura
2009-05-29 23:40:02 +0800
6d2661ede oom: fix possible oom_dump_tasks NULL pointer ... Browse Code »

When /proc/sys/vm/oom_dump_tasks is enabled, it is possible to get a NULL
pointer for tasks that have detached mm's since task_lock() is not held
during the tasklist scan. Add the task_lock().

Acked-by: Nick Piggin
Acked-by: Mel Gorman
Cc: Rik van Riel
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2009-05-29 23:40:01 +0800

22 May, 2009

1 commit

98f32602d hugh: update email address ... Browse Code »

My old address will shut down in a few days time: remove it from the tree,
and add a tmpfs (shmem filesystem) maintainer entry with the new address.

Signed-off-by: Hugh Dickins
Signed-off-by: Hugh Dickins
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-05-22 04:14:32 +0800

21 May, 2009

1 commit

9fe02c03b Merge master.kernel.org:/home/rmk/linux-2.6-arm ... Browse Code »

* master.kernel.org:/home/rmk/linux-2.6-arm: (25 commits)
[ARM] 5519/1: amba probe: pass "struct amba_id *" instead of void *
[ARM] 5517/1: integrator: don't put clock lookups in __initdata
[ARM] 5518/1: versatile: don't put clock lookups in __initdata
[ARM] mach-l7200: fix spelling of SYS_CLOCK_OFF
[ARM] Double check memmap is actually valid with a memmap has unexpected holes V2
[ARM] realview: fix broadcast tick support
[ARM] realview: remove useless smp_cross_call_done()
[ARM] smp: fix cpumask usage in ARM SMP code
[ARM] 5513/1: Eurotech VIPER SBC: fix compilation error
[ARM] 5509/1: ep93xx: clkdev enable UARTS
ARM: OMAP2/3: Change omapfb to use clkdev for dispc and rfbi, v2
ARM: OMAP3: Fix HW SAVEANDRESTORE shift define
ARM: OMAP3: Fix number of GPIO lines for 34xx
[ARM] S3C: Do not set clk->owner field if unset
[ARM] S3C2410: mach-bast.c registering i2c data too early
[ARM] S3C24XX: Fix unused code warning in arch/arm/plat-s3c24xx/dma.c
[ARM] S3C64XX: fix GPIO debug
[ARM] S3C64XX: GPIO include cleanup
[ARM] nwfpe: fix 'floatx80_is_nan' sparse warning
[ARM] nwfpe: Add decleration for ExtendedCPDO
...

Linus Torvalds
2009-05-21 07:30:36 +0800

18 May, 2009

3 commits

eb33575cf [ARM] Double check memmap is actually valid with a memmap has unexpected holes V2 ... Browse Code »

pfn_valid() is meant to be able to tell if a given PFN has valid memmap
associated with it or not. In FLATMEM, it is expected that holes always
have valid memmap as long as there is valid PFNs either side of the hole.
In SPARSEMEM, it is assumed that a valid section has a memmap for the
entire section.

However, ARM and maybe other embedded architectures in the future free
memmap backing holes to save memory on the assumption the memmap is never
used. The page_zone linkages are then broken even though pfn_valid()
returns true. A walker of the full memmap must then do this additional
check to ensure the memmap they are looking at is sane by making sure the
zone and PFN linkages are still valid. This is expensive, but walkers of
the full memmap are extremely rare.

This was caught before for FLATMEM and hacked around but it hits again for
SPARSEMEM because the page_zone linkages can look ok where the PFN linkages
are totally screwed. This looks like a hatchet job but the reality is that
any clean solution would end up consumning all the memory saved by punching
these unexpected holes in the memmap. For example, we tried marking the
memmap within the section invalid but the section size exceeds the size of
the hole in most cases so pfn_valid() starts returning false where valid
memmap exists. Shrinking the size of the section would increase memory
consumption offsetting the gains.

This patch identifies when an architecture is punching unexpected holes
in the memmap that the memory model cannot automatically detect and sets
ARCH_HAS_HOLES_MEMORYMODEL. At the moment, this is restricted to EP93xx
which is the model sub-architecture this has been reported on but may expand
later. When set, walkers of the full memmap must call memmap_valid_within()
for each PFN and passing in what it expects the page and zone to be for
that PFN. If it finds the linkages to be broken, it assumes the memmap is
invalid for that PFN.

Signed-off-by: Mel Gorman
Signed-off-by: Russell King

Mel Gorman
2009-05-18 18:22:24 +0800
22ef37eed page-writeback: fix the calculation of the oldest_jif in wb_kupdate() ... Browse Code »

wb_kupdate() function has a bug on linux-2.6.30-rc5. This bug causes
generic_sync_sb_inodes() to start to write inodes back much earlier than
our expectations because it miscalculates oldest_jif in wb_kupdate().

This bug was introduced in 704503d836042d4a4c7685b7036e7de0418fbc0f
('mm: fix proc_dointvec_userhz_jiffies "breakage"').

Signed-off-by: Toshiyuki Okajima
Cc: Alexey Dobriyan
Cc: Peter Zijlstra
Cc: Nick Piggin
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Toshiyuki Okajima
2009-05-18 07:36:11 +0800
bba0b4ec3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
mm: SLOB fix reclaim_state
mm: SLUB fix reclaim_state
slub: add Documentation/ABI/testing/sysfs-kernel-slab
slub: enforce MAX_ORDER

Linus Torvalds
2009-05-18 02:44:19 +0800

15 May, 2009

2 commits

c65384998 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
Revert "mm: add /proc controls for pdflush threads"
viocd: needs to depend on BLOCK
block: fix the bio_vec array index out-of-bounds test

Linus Torvalds
2009-05-15 23:05:37 +0800
cd17cbfda Revert "mm: add /proc controls for pdflush threads" ... Browse Code »

This reverts commit fafd688e4c0c34da0f3de909881117d374e4c7af.

Work is progressing to switch away from pdflush as the process backing
for flushing out dirty data. So it seems pointless to add more knobs
to control pdflush threads. The original author of the patch did not
have any specific use cases for adding the knobs, so we can easily
revert this before 2.6.30 to avoid having to maintain this API
forever.

Signed-off-by: Jens Axboe

Jens Axboe
2009-05-15 17:32:24 +0800

13 May, 2009

1 commit

0f1813282 Revert "Ignore madvise(MADV_WILLNEED) for hugetlbfs-backed regions" ... Browse Code »

This reverts commit a425a638c858fd10370b573bde81df3ba500e271.

Now that the previous commit removed the "readpage" actor for hugetlb
files, read-ahead will no longer mess up the mapping, and there's no
longer any reason to treat hugetlbfs mappings specially.

Tested-and-acked-by: Mel Gorman
Signed-off-by: Linus Torvalds

Linus Torvalds
2009-05-13 23:29:12 +0800

08 May, 2009

1 commit

8c9ed899b NOMMU: Don't check vm_region::vm_start is page aligned in add_nommu_region() ... Browse Code »

Don't check vm_region::vm_start is page aligned in add_nommu_region() because
the region may reflect some non-page-aligned mapped file, such as could be
obtained from RomFS XIP.

Signed-off-by: David Howells
Acked-by: Greg Ungerer
Signed-off-by: Linus Torvalds

David Howells
2009-05-08 03:03:41 +0800

07 May, 2009

5 commits

fc4d5c292 nommu: make the initial mmap allocation excess behaviour Kconfig configurable ... Browse Code »

NOMMU mmap() has an option controlled by a sysctl variable that determines
whether the allocations made by do_mmap_private() should have the excess
space trimmed off and returned to the allocator. Make the initial setting
of this variable a Kconfig configuration option.

The reason there can be excess space is that the allocator only allocates
in power-of-2 size chunks, but mmap()'s can be made in sizes that aren't a
power of 2.

There are two alternatives:

(1) Keep the excess as dead space. The dead space then remains unused for the
lifetime of the mapping. Mappings of shared objects such as libc, ld.so
or busybox's text segment may retain their dead space forever.

(2) Return the excess to the allocator. This means that the dead space is
limited to less than a page per mapping, but it means that for a transient
process, there's more chance of fragmentation as the excess space may be
reused fairly quickly.

During the boot process, a lot of transient processes are created, and
this can cause a lot of fragmentation as the pagecache and various slabs
grow greatly during this time.

By turning off the trimming of excess space during boot and disabling
batching of frees, Coldfire can manage to boot.

A better way of doing things might be to have /sbin/init turn this option
off. By that point libc, ld.so and init - which are all long-duration
processes - have all been loaded and trimmed.

Reported-by: Lanttor Guo
Signed-off-by: David Howells
Tested-by: Lanttor Guo
Cc: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2009-05-07 07:36:10 +0800
3a6be87fd nommu: clamp zone_batchsize() to 0 under NOMMU conditions ... Browse Code »

Clamp zone_batchsize() to 0 under NOMMU conditions to stop
free_hot_cold_page() from queueing and batching frees.

The problem is that under NOMMU conditions it is really important to be
able to allocate large contiguous chunks of memory, but when munmap() or
exit_mmap() releases big stretches of memory, return of these to the buddy
allocator can be deferred, and when it does finally happen, it can be in
small chunks.

Whilst the fragmentation this incurs isn't so much of a problem under MMU
conditions as userspace VM is glued together from individual pages with
the aid of the MMU, it is a real problem if there isn't an MMU.

By clamping the page freeing queue size to 0, pages are returned to the
allocator immediately, and the buddy detector is more likely to be able to
glue them together into large chunks immediately, and fragmentation is
less likely to occur.

By disabling batching of frees, and by turning off the trimming of excess
space during boot, Coldfire can manage to boot.

Reported-by: Lanttor Guo
Signed-off-by: David Howells
Tested-by: Lanttor Guo
Cc: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2009-05-07 07:36:10 +0800
9155203a5 mm: use roundown_pow_of_two() in zone_batchsize() ... Browse Code »

Use roundown_pow_of_two(N) in zone_batchsize() rather than (1 <<
(fls(N)-1)) as they are equivalent, and with the former it is easier to
see what is going on.

Signed-off-by: David Howells
Tested-by: Lanttor Guo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2009-05-07 07:36:10 +0800
2498ce42d alloc_vmap_area: fix memory leak ... Browse Code »

If alloc_vmap_area() fails the allocated struct vmap_area has to be freed.

Signed-off-by: Ralph Wuerthner
Reviewed-by: Christoph Lameter
Reviewed-by: Minchan Kim
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ralph Wuerthner
2009-05-07 07:36:10 +0800
184101bf1 oom: prevent livelock when oom_kill_allocating_task is set ... Browse Code »

When /proc/sys/vm/oom_kill_allocating_task is set for large systems that
want to avoid the lengthy tasklist scan, it's possible to livelock if
current is ineligible for oom kill. This normally happens when it is set
to OOM_DISABLE, but is also possible if any threads are sharing the same
->mm with a different tgid.

So change __out_of_memory() to fall back to the full task-list scan if it
was unable to kill `current'.

Cc: Nick Piggin
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2009-05-07 07:36:09 +0800

06 May, 2009

4 commits

42ddc4cbb Merge branches 'topic/documentation', 'topic/slub/fixes' and 'topic/urgent' into for-linus Browse Code »

Pekka Enberg
2009-05-06 15:27:43 +0800
1f0532eb6 mm: SLOB fix reclaim_state ... Browse Code »

SLOB does not correctly account reclaim_state.reclaimed_slab, so it will
break memory reclaim. Account it like SLAB does.

Cc: stable@kernel.org
Cc: linux-mm@kvack.org
Acked-by: Matt Mackall
Acked-by: Christoph Lameter
Signed-off-by: Nick Piggin
Signed-off-by: Pekka Enberg

Nick Piggin
2009-05-06 15:23:17 +0800
1eb5ac646 mm: SLUB fix reclaim_state ... Browse Code »

SLUB does not correctly account reclaim_state.reclaimed_slab, so it will
break memory reclaim. Account it like SLAB does.

Cc: stable@kernel.org
Cc: linux-mm@kvack.org
Cc: Matt Mackall
Acked-by: Christoph Lameter
Signed-off-by: Nick Piggin
Signed-off-by: Pekka Enberg

Nick Piggin
2009-05-06 15:23:02 +0800
a425a638c Ignore madvise(MADV_WILLNEED) for hugetlbfs-backed regions ... Browse Code »

madvise(MADV_WILLNEED) forces page cache readahead on a range of memory
backed by a file. The assumption is made that the page required is
order-0 and "normal" page cache.

On hugetlbfs, this assumption is not true and order-0 pages are
allocated and inserted into the hugetlbfs page cache. This leaks
hugetlbfs page reservations and can cause BUGs to trigger related to
corrupted page tables.

This patch causes MADV_WILLNEED to be ignored for hugetlbfs-backed
regions.

Signed-off-by: Mel Gorman
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Mel Gorman
2009-05-06 05:37:58 +0800

03 May, 2009

6 commits

8713e0129 vmscan: avoid multiplication overflow in shrink_zone() ... Browse Code »

Local variable `scan' can overflow on zones which are larger than

(2G * 4k) / 100 = 80GB.

Making it 64-bit on 64-bit will fix that up.

Cc: KOSAKI Motohiro
Cc: Wu Fengguang
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2009-05-03 06:36:10 +0800
00a62ce91 mm: fix Committed_AS underflow on large NR_CPUS environment ... Browse Code »
43

The Committed_AS field can underflow in certain situations:

> # while true; do cat /proc/meminfo | grep _AS; sleep 1; done | uniq -c
> 1 Committed_AS: 18446744073709323392 kB
> 11 Committed_AS: 18446744073709455488 kB
> 6 Committed_AS: 35136 kB
> 5 Committed_AS: 18446744073709454400 kB
> 7 Committed_AS: 35904 kB
> 3 Committed_AS: 18446744073709453248 kB
> 2 Committed_AS: 34752 kB
> 9 Committed_AS: 18446744073709453248 kB
> 8 Committed_AS: 34752 kB
> 3 Committed_AS: 18446744073709320960 kB
> 7 Committed_AS: 18446744073709454080 kB
> 3 Committed_AS: 18446744073709320960 kB
> 5 Committed_AS: 18446744073709454080 kB
> 6 Committed_AS: 18446744073709320960 kB

Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does
not check for underflow.

But NR_CPUS proportional isn't good calculation. In general,
possibility of lock contention is proportional to the number of online
cpus, not theorical maximum cpus (NR_CPUS).

The current kernel has generic percpu-counter stuff. using it is right
way. it makes code simplify and percpu_counter_read_positive() don't
make underflow issue.

Reported-by: Dave Hansen
Signed-off-by: KOSAKI Motohiro
Cc: Eric B Munson
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: [All kernel versions]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-05-03 06:36:10 +0800
ae3abae64 memcg: fix mem_cgroup_shrink_usage() ... Browse Code »

Current mem_cgroup_shrink_usage() has two problems.

1. It doesn't call mem_cgroup_out_of_memory and doesn't update
last_oom_jiffies, so pagefault_out_of_memory invokes global OOM.

2. Considering hierarchy, shrinking has to be done from the
mem_over_limit, not from the memcg which the page would be charged to.

mem_cgroup_try_charge_swapin() does all of these things properly, so we
use it and call cancel_charge_swapin when it succeeded.

The name of "shrink_usage" is not appropriate for this behavior, so we
change it too.

Signed-off-by: Daisuke Nishimura
Acked-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Paul Menage
Cc: Dhaval Giani
Cc: Daisuke Nishimura
Cc: YAMAMOTO Takashi
Cc: KOSAKI Motohiro
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke Nishimura
2009-05-03 06:36:09 +0800
b827e496c mm: close page_mkwrite races ... Browse Code »

Change page_mkwrite to allow implementations to return with the page
locked, and also change it's callers (in page fault paths) to hold the
lock until the page is marked dirty. This allows the filesystem to have
full control of page dirtying events coming from the VM.

Rather than simply hold the page locked over the page_mkwrite call, we
call page_mkwrite with the page unlocked and allow callers to return with
it locked, so filesystems can avoid LOR conditions with page lock.

The problem with the current scheme is this: a filesystem that wants to
associate some metadata with a page as long as the page is dirty, will
perform this manipulation in its ->page_mkwrite. It currently then must
return with the page unlocked and may not hold any other locks (according
to existing page_mkwrite convention).

In this window, the VM could write out the page, clearing page-dirty. The
filesystem has no good way to detect that a dirty pte is about to be
attached, so it will happily write out the page, at which point, the
filesystem may manipulate the metadata to reflect that the page is no
longer dirty.

It is not always possible to perform the required metadata manipulation in
->set_page_dirty, because that function cannot block or fail. The
filesystem may need to allocate some data structure, for example.

And the VM cannot mark the pte dirty before page_mkwrite, because
page_mkwrite is allowed to fail, so we must not allow any window where the
page could be written to if page_mkwrite does fail.

This solution of holding the page locked over the 3 critical operations
(page_mkwrite, setting the pte dirty, and finally setting the page dirty)
closes out races nicely, preventing page cleaning for writeout being
initiated in that window. This provides the filesystem with a strong
synchronisation against the VM here.

- Sage needs this race closed for ceph filesystem.
- Trond for NFS (http://bugzilla.kernel.org/show_bug.cgi?id=12913).
- I need it for fsblock.
- I suspect other filesystems may need it too (eg. btrfs).
- I have converted buffer.c to the new locking. Even simple block allocation
under dirty pages might be susceptible to i_size changing under partial page
at the end of file (we also have a buffer.c-side problem here, but it cannot
be fixed properly without this patch).
- Other filesystems (eg. NFS, maybe btrfs) will need to change their
page_mkwrite functions themselves.

[ This also moves page_mkwrite another step closer to fault, which should
eventually allow page_mkwrite to be moved into ->fault, and thus avoiding a
filesystem calldown and page lock/unlock cycle in __do_fault. ]

[akpm@linux-foundation.org: fix derefs of NULL ->mapping]
Cc: Sage Weil
Cc: Trond Myklebust
Signed-off-by: Nick Piggin
Cc: Valdis Kletnieks
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2009-05-03 06:36:09 +0800
c0bd3f63c memcg: fix try_get_mem_cgroup_from_swapcache() ... Browse Code »

This is a bugfix for commit 3c776e64660028236313f0e54f3a9945764422df
("memcg: charge swapcache to proper memcg").

Used bit of swapcache is solid under page lock, but considering
move_account, pc->mem_cgroup is not.

We need lock_page_cgroup() anyway.

Signed-off-by: Daisuke Nishimura
Acked-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke Nishimura
2009-05-03 06:36:09 +0800
bc43f75cd mm: fix pageref leak in do_swap_page() ... Browse Code »

By the time the memory cgroup code is notified about a swapin we
already hold a reference on the fault page.

If the cgroup callback fails make sure to unlock AND release the page
reference which was taken by lookup_swap_cach(), or we leak the reference.

Signed-off-by: Johannes Weiner
Cc: Balbir Singh
Reviewed-by: Minchan Kim
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2009-05-03 06:36:09 +0800

23 Apr, 2009

1 commit

818cf5909 slub: enforce MAX_ORDER ... Browse Code »

slub_max_order may not be equal to or greater than MAX_ORDER.

Additionally, if a single object cannot be placed in a slab of
slub_max_order, it still must allocate slabs below MAX_ORDER.

Acked-by: Christoph Lameter
Signed-off-by: David Rientjes
Signed-off-by: Pekka Enberg

David Rientjes
2009-04-23 14:58:22 +0800

22 Apr, 2009

1 commit

2e2e42598 vmscan,memcg: reintroduce sc->may_swap ... Browse Code »

Commit a6dc60f8975ad96d162915e07703a4439c80dcf0 ("vmscan: rename
sc.may_swap to may_unmap") removed the may_swap flag, but memcg had used
it as a flag for "we need to use swap?", as the name indicate.

And in the current implementation, memcg cannot reclaim mapped file
caches when mem+swap hits the limit.

re-introduce may_swap flag and handle it at get_scan_ratio(). This
patch doesn't influence any scan_control users other than memcg.

Signed-off-by: KOSAKI Motohiro
Signed-off-by: Daisuke Nishimura
Acked-by: Johannes Weiner
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-04-22 04:41:51 +0800

19 Apr, 2009

1 commit

a21e25536 PM/Hibernate: Fix memory shrinking ... Browse Code »

Commit d979677c4c0 ("mm: shrink_all_memory(): use sc.nr_reclaimed")
broke the memory shrinking used by hibernation, becuse it did not update
shrink_all_zones() in accordance with the other changes it made.

Fix this by making shrink_all_zones() update sc->nr_reclaimed instead of
overwriting its value.

This fixes http://bugzilla.kernel.org/show_bug.cgi?id=13058

Reported-and-tested-by: Alan Jenkins
Signed-off-by: Rafael J. Wysocki
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2009-04-19 02:36:58 +0800

17 Apr, 2009

1 commit

05fa199d4 mm: pass correct mm when growing stack ... Browse Code »

Tetsuo Handa reports seeing the WARN_ON(current->mm == NULL) in
security_vm_enough_memory(), when do_execve() is touching the
target mm's stack, to set up its args and environment.

Yes, a UMH_NO_WAIT or UMH_WAIT_PROC call_usermodehelper() spawns
an mm-less kernel thread to do the exec. And in any case, that
vm_enough_memory check when growing stack ought to be done on the
target mm, not on the execer's mm (though apart from the warning,
it only makes a slight tweak to OVERCOMMIT_NEVER behaviour).

Reported-by: Tetsuo Handa
Signed-off-by: Hugh Dickins
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-04-17 05:41:25 +0800

16 Apr, 2009

1 commit

f69955855 Export filemap_write_and_wait_range ... Browse Code »

This wasn't exported before and is useful (used by the experimental ext3
data=guarded code)

Signed-off-by: Chris Mason
Acked-by: Theodore Tso
Acked-by: Jan Kara
Signed-off-by: Linus Torvalds

Chris Mason
2009-04-16 22:47:49 +0800

14 Apr, 2009

6 commits

caefba174 shmem: respect MAX_LFS_FILESIZE ... Browse Code »

SHMEM_MAX_BYTES was derived from the maximum size of its triple-indirect
swap vector, forgetting to take the MAX_LFS_FILESIZE limit into account.
Never mind 256kB pages, even 8kB pages on 32-bit kernels allowed files to
grow slightly bigger than that supposed maximum.

Fix this by using the min of both (at build time not run time). And it
happens that this calculation is good as far as 8MB pages on 32-bit or
16MB pages on 64-bit: though SHMSWP_MAX_INDEX gets truncated before that,
it's truncated to such large numbers that we don't need to care.

[akpm@linux-foundation.org: it needs pagemap.h]
[akpm@linux-foundation.org: fix sparc64 min() warnings]
Signed-off-by: Hugh Dickins
Cc: Yuri Tikhonov
Cc: Paul Mackerras
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-04-14 06:04:33 +0800
61609d01c shmem: fix division by zero ... Browse Code »

Fix a division by zero which we have in shmem_truncate_range() and
shmem_unuse_inode() when using big PAGE_SIZE values (e.g. 256kB on
ppc44x).

With 256kB PAGE_SIZE, the ENTRIES_PER_PAGEPAGE constant becomes too large
(0x1.0000.0000) on a 32-bit kernel, so this patch just changes its type
from 'unsigned long' to 'unsigned long long'.

Hugh: reverted its unsigned long longs in shmem_truncate_range() and
shmem_getpage(): the pagecache index cannot be more than an unsigned long,
so the divisions by zero occurred in unreached code. It's a pity we need
any ULL arithmetic here, but I found no pretty way to avoid it.

Signed-off-by: Yuri Tikhonov
Signed-off-by: Hugh Dickins
Cc: Paul Mackerras
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yuri Tikhonov
2009-04-14 06:04:32 +0800
a8031cb00 memcg: remove warning when CONFIG_DEBUG_VM=n ... Browse Code »

mm/memcontrol.c:318: warning: `mem_cgroup_is_obsolete' defined but not used

[akpm@linux-foundation.org: simplify as suggested by Balbir]
Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-04-14 06:04:32 +0800
9de100d00 mm: document get_user_pages_fast() ... Browse Code »

While better than get_user_pages(), the usage of gupf(), especially the
return values and the fact that it can potentially only partially pin the
range, warranted some documentation.

Signed-off-by: Andy Grover
Cc: Ingo Molnar
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andy Grover
2009-04-14 06:04:32 +0800
5a52edded mm: point the UNEVICTABLE_LRU config option at the documentation ... Browse Code »

Point the UNEVICTABLE_LRU config option at the documentation describing
the option.

Signed-off-by: David Howells
Cc: Lee Schermerhorn
Cc: Rik van Riel
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2009-04-14 06:04:31 +0800
697f619fc filemap: fix kernel-doc warnings ... Browse Code »

Fix filemap.c kernel-doc warnings:

Warning(mm/filemap.c:575): No description found for parameter 'page'
Warning(mm/filemap.c:575): No description found for parameter 'waiter'

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2009-04-14 06:04:30 +0800

07 Apr, 2009

1 commit

fafd688e4 mm: add /proc controls for pdflush threads ... Browse Code »

Add /proc entries to give the admin the ability to control the minimum and
maximum number of pdflush threads. This allows finer control of pdflush
on both large and small machines.

The rationale is simply one size does not fit all. Admins on large and/or
small systems may want to tune the min/max pdflush thread count to best
suit their needs. Right now the min/max is hardcoded to 2/8. While
probably a fair estimate for smaller machines, large machines with large
numbers of CPUs and large numbers of filesystems/block devices may benefit
from larger numbers of threads working on different block devices.

Even if the background flushing algorithm is radically changed, it is
still likely that multiple threads will be involved and admins would still
desire finer control on the min/max other than to have to recompile the
kernel.

The patch adds '/proc/sys/vm/nr_pdflush_threads_min' and
'/proc/sys/vm/nr_pdflush_threads_max' with r/w permissions.

The minimum value for nr_pdflush_threads_min is 1 and the maximum value is
the current value of nr_pdflush_threads_max. This minimum is required
since additional thread creation is performed in a pdflush thread itself.

The minimum value for nr_pdflush_threads_max is the current value of
nr_pdflush_threads_min and the maximum value can be 1000.

Documentation/sysctl/vm.txt is also updated.

[akpm@linux-foundation.org: fix comment, fix whitespace, use __read_mostly]
Signed-off-by: Peter W Morreale
Reviewed-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter W Morreale
2009-04-07 23:31:03 +0800