Eric Lee / smarc-fsl-linux-kernel

30 Jan, 2010

1 commit

a7016235a mm: fix migratetype bug which slowed swapping ... Browse Code »

After memory pressure has forced it to dip into the reserves, 2.6.32's
5f8dcc21211a3d4e3a7a5ca366b469fb88117f61 "page-allocator: split per-cpu
list into one-list-per-migrate-type" has been returning MIGRATE_RESERVE
pages to the MIGRATE_MOVABLE free_list: in some sense depleting reserves.

Fix that in the most straightforward way (which, considering the overheads
of alternative approaches, is Mel's preference): the right migratetype is
already in page_private(page), but free_pcppages_bulk() wasn't using it.

How did this bug show up? As a 20% slowdown in my tmpfs loop kbuild
swapping tests, on PowerMac G5 with SLUB allocator. Bisecting to that
commit was easy, but explaining the magnitude of the slowdown not easy.

The same effect appears, but much less markedly, with SLAB, and even
less markedly on other machines (the PowerMac divides into fewer zones
than x86, I think that may be a factor). We guess that lumpy reclaim
of short-lived high-order pages is implicated in some way, and probably
this bug has been tickling a poor decision somewhere in page reclaim.

But instrumentation hasn't told me much, I've run out of time and
imagination to determine exactly what's going on, and shouldn't hold up
the fix any longer: it's valid, and might even fix other misbehaviours.

Signed-off-by: Hugh Dickins
Acked-by: Mel Gorman
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Hugh Dickins
2010-01-30 02:28:09 +0800

28 Jan, 2010

1 commit

0531b2aac mm: add new 'read_cache_page_gfp()' helper function ... Browse Code »

It's a simplified 'read_cache_page()' which takes a page allocation
flag, so that different paths can control how aggressive the memory
allocations are that populate a address space.

In particular, the intel GPU object mapping code wants to be able to do
a certain amount of own internal memory management by automatically
shrinking the address space when memory starts getting tight. This
allows it to dynamically use different memory allocation policies on a
per-allocation basis, rather than depend on the (static) address space
gfp policy.

The actual new function is a one-liner, but re-organizing the helper
functions to the point where you can do this with a single line of code
is what most of the patch is all about.

Tested-by: Chris Wilson
Signed-off-by: Linus Torvalds

Linus Torvalds
2010-01-28 01:20:03 +0800

21 Jan, 2010

1 commit

88f500443 vmalloc: remove BUG_ON due to racy counting of VM_LAZY_FREE ... Browse Code »

In free_unmap_area_noflush(), va->flags is marked as VM_LAZY_FREE first, and
then vmap_lazy_nr is increased atomically.

But, in __purge_vmap_area_lazy(), while traversing of vmap_are_list, nr
is counted by checking VM_LAZY_FREE is set to va->flags. After counting
the variable nr, kernel reads vmap_lazy_nr atomically and checks a
BUG_ON condition whether nr is greater than vmap_lazy_nr to prevent
vmap_lazy_nr from being negative.

The problem is that, if interrupted right after marking VM_LAZY_FREE,
increment of vmap_lazy_nr can be delayed. Consequently, BUG_ON
condition can be met because nr is counted more than vmap_lazy_nr.

It is highly probable when vmalloc/vfree are called frequently. This
scenario have been verified by adding delay between marking VM_LAZY_FREE
and increasing vmap_lazy_nr in free_unmap_area_noflush().

Even the vmap_lazy_nr is for checking high watermark, it never be the
strict watermark. Although the BUG_ON condition is to prevent
vmap_lazy_nr from being negative, vmap_lazy_nr is signed variable. So,
it could go down to negative value temporarily.

Consequently, removing the BUG_ON condition is proper.

A possible BUG_ON message is like the below.

kernel BUG at mm/vmalloc.c:517!
invalid opcode: 0000 [#1] SMP
EIP: 0060:[] EFLAGS: 00010297 CPU: 3
EIP is at __purge_vmap_area_lazy+0x144/0x150
EAX: ee8a8818 EBX: c08e77d4 ECX: e7c7ae40 EDX: c08e77ec
ESI: 000081fe EDI: e7c7ae60 EBP: e7c7ae64 ESP: e7c7ae3c
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Call Trace:
[] free_unmap_vmap_area_noflush+0x69/0x70
[] remove_vm_area+0x22/0x70
[] __vunmap+0x45/0xe0
[] vmalloc+0x2c/0x30
Code: 8d 59 e0 eb 04 66 90 89 cb 89 d0 e8 87 fe ff ff 8b 43 20 89 da 8d 48 e0 8d 43 20 3b 04 24 75 e7 fe 05 a8 a5 a3 c0 e9 78 ff ff ff 0b eb fe 90 8d b4 26 00 00 00 00 56 89 c6 b8 ac a5 a3 c0 31
EIP: [] __purge_vmap_area_lazy+0x144/0x150 SS:ESP 0068:e7c7ae3c

[ See also http://marc.info/?l=linux-kernel&m=126335856228090&w=2 ]

Signed-off-by: Yongseok Koh
Reviewed-by: Minchan Kim
Cc: Nick Piggin
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Yongseok Koh
2010-01-21 23:20:06 +0800

17 Jan, 2010

8 commits

6ccf80eb1 page allocator: update NR_FREE_PAGES only when necessary ... Browse Code »

commit f2260e6b (page allocator: update NR_FREE_PAGES only as necessary)
made one minor regression. if __rmqueue() was failed, NR_FREE_PAGES stat
go wrong. this patch fixes it.

Signed-off-by: KOSAKI Motohiro
Cc: Mel Gorman
Reviewed-by: Minchan Kim
Reported-by: Huang Shijie
Reviewed-by: Christoph Lameter
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-01-17 08:53:55 +0800
7e6608724 nommu: fix shared mmap after truncate shrinkage problems ... Browse Code »

Fix a problem in NOMMU mmap with ramfs whereby a shared mmap can happen
over the end of a truncation. The problem is that
ramfs_nommu_check_mappings() checks that the reduced file size against the
VMA tree, but not the vm_region tree.

The following sequence of events can cause the problem:

fd = open("/tmp/x", O_RDWR|O_TRUNC|O_CREAT, 0600);
ftruncate(fd, 32 * 1024);
a = mmap(NULL, 32 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
b = mmap(NULL, 16 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
munmap(a, 32 * 1024);
ftruncate(fd, 16 * 1024);
c = mmap(NULL, 32 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

Mapping 'a' creates a vm_region covering 32KB of the file. Mapping 'b'
sees that the vm_region from 'a' is covering the region it wants and so
shares it, pinning it in memory.

Mapping 'a' then goes away and the file is truncated to the end of VMA
'b'. However, the region allocated by 'a' is still in effect, and has
_not_ been reduced.

Mapping 'c' is then created, and because there's a vm_region covering the
desired region, get_unmapped_area() is _not_ called to repeat the check,
and the mapping is granted, even though the pages from the latter half of
the mapping have been discarded.

However:

d = mmap(NULL, 16 * 1024, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);

Mapping 'd' should work, and should end up sharing the region allocated by
'a'.

To deal with this, we shrink the vm_region struct during the truncation,
lest do_mmap_pgoff() take it as licence to share the full region
automatically without calling the get_unmapped_area() file op again.

Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2010-01-17 04:15:40 +0800
efc1a3b16 nommu: don't need get_unmapped_area() for NOMMU ... Browse Code »

get_unmapped_area() is unnecessary for NOMMU as no-one calls it.

Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2010-01-17 04:15:40 +0800
779c10232 nommu: remove a superfluous check of vm_region::vm_usage ... Browse Code »

In split_vma(), there's no need to check if the VMA being split has a
region that's in use by more than one VMA because:

(1) The preceding test prohibits splitting of non-anonymous VMAs and regions
(eg: file or chardev backed VMAs).

(2) Anonymous regions can't be mapped multiple times because there's no handle
by which to refer to the already existing region.

(3) If a VMA has previously been split, then the region backing it has also
been split into two regions, each of usage 1.

Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2010-01-17 04:15:40 +0800
1e2ae599d nommu: struct vm_region's vm_usage count need not be atomic ... Browse Code »

The vm_usage count field in struct vm_region does not need to be atomic as
it's only even modified whilst nommu_region_sem is write locked.

Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2010-01-17 04:15:40 +0800
fce664775 memcg: ensure list is empty at rmdir ... Browse Code »

Current mem_cgroup_force_empty() only ensures mem->res.usage == 0 on
success. But this doesn't guarantee memcg's LRU is really empty, because
there are some cases in which !PageCgrupUsed pages exist on memcg's LRU.

For example:
- Pages can be uncharged by its owner process while they are on LRU.
- race between mem_cgroup_add_lru_list() and __mem_cgroup_uncharge_common().

So there can be a case in which the usage is zero but some of the LRUs are not empty.

OTOH, mem_cgroup_del_lru_list(), which can be called asynchronously with
rmdir, accesses the mem_cgroup, so this access can cause a problem if it
races with rmdir because the mem_cgroup might have been freed by rmdir.

Actually, I saw a bug which seems to be caused by this race.

[1530745.949906] BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
[1530745.950651] IP: [] mem_cgroup_del_lru_list+0x30/0x80
[1530745.950651] PGD 3863de067 PUD 3862c7067 PMD 0
[1530745.950651] Oops: 0002 [#1] SMP
[1530745.950651] last sysfs file: /sys/devices/system/cpu/cpu7/cache/index1/shared_cpu_map
[1530745.950651] CPU 3
[1530745.950651] Modules linked in: configs ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp nfsd nfs_acl auth_rpcgss exportfs autofs4 hidp rfcomm l2cap crc16 bluetooth lockd sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_multipath scsi_dh video output sbs sbshc battery ac lp kvm_intel kvm sg ide_cd_mod cdrom serio_raw tpm_tis tpm tpm_bios acpi_memhotplug button parport_pc parport rtc_cmos rtc_core rtc_lib e1000 i2c_i801 i2c_core pcspkr dm_region_hash dm_log dm_mod ata_piix libata shpchp megaraid_mbox sd_mod scsi_mod megaraid_mm ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: freq_table]
[1530745.950651] Pid: 19653, comm: shmem_test_02 Tainted: G M 2.6.32-mm1-00701-g2b04386 #3 Express5800/140Rd-4 [N8100-1065]
[1530745.950651] RIP: 0010:[] [] mem_cgroup_del_lru_list+0x30/0x80
[1530745.950651] RSP: 0018:ffff8803863ddcb8 EFLAGS: 00010002
[1530745.950651] RAX: 00000000000001e0 RBX: ffff8803abc02238 RCX: 00000000000001e0
[1530745.950651] RDX: 0000000000000000 RSI: ffff88038611a000 RDI: ffff8803abc02238
[1530745.950651] RBP: ffff8803863ddcc8 R08: 0000000000000002 R09: ffff8803a04c8643
[1530745.950651] R10: 0000000000000000 R11: ffffffff810c7333 R12: 0000000000000000
[1530745.950651] R13: ffff880000017f00 R14: 0000000000000092 R15: ffff8800179d0310
[1530745.950651] FS: 0000000000000000(0000) GS:ffff880017800000(0000) knlGS:0000000000000000
[1530745.950651] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[1530745.950651] CR2: 0000000000000230 CR3: 0000000379d87000 CR4: 00000000000006e0
[1530745.950651] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1530745.950651] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[1530745.950651] Process shmem_test_02 (pid: 19653, threadinfo ffff8803863dc000, task ffff88038612a8a0)
[1530745.950651] Stack:
[1530745.950651] ffffea00040c2fe8 0000000000000000 ffff8803863ddd98 ffffffff810c739a
[1530745.950651] 00000000863ddd18 000000000000000c 0000000000000000 0000000000000000
[1530745.950651] 0000000000000002 0000000000000000 ffff8803863ddd68 0000000000000046
[1530745.950651] Call Trace:
[1530745.950651] [] release_pages+0x142/0x1e7
[1530745.950651] [] ? pagevec_move_tail+0x6e/0x112
[1530745.950651] [] pagevec_move_tail+0xfd/0x112
[1530745.950651] [] lru_add_drain+0x76/0x94
[1530745.950651] [] exit_mmap+0x6e/0x145
[1530745.950651] [] mmput+0x5e/0xcf
[1530745.950651] [] exit_mm+0x11c/0x129
[1530745.950651] [] ? audit_free+0x196/0x1c9
[1530745.950651] [] do_exit+0x1f5/0x6b7
[1530745.950651] [] ? up_read+0x2b/0x2f
[1530745.950651] [] ? lockdep_sys_exit_thunk+0x35/0x67
[1530745.950651] [] do_group_exit+0x83/0xb0
[1530745.950651] [] sys_exit_group+0x17/0x1b
[1530745.950651] [] system_call_fastpath+0x16/0x1b
[1530745.950651] Code: 54 53 0f 1f 44 00 00 83 3d cc 29 7c 00 00 41 89 f4 75 63 eb 4e 48 83 7b 08 00 75 04 0f 0b eb fe 48 89 df e8 18 f3 ff ff 44 89 e2 ff 4c d0 50 48 8b 05 2b 2d 7c 00 48 39 43 08 74 39 48 8b 4b
[1530745.950651] RIP [] mem_cgroup_del_lru_list+0x30/0x80
[1530745.950651] RSP
[1530745.950651] CR2: 0000000000000230
[1530745.950651] ---[ end trace c3419c1bb8acc34f ]---
[1530745.950651] Fixing recursive fault but reboot is needed!

The problem here is pages on LRU may contain pointer to stale memcg. To
make res->usage to be 0, all pages on memcg must be uncharged or moved to
another(parent) memcg. Moved page_cgroup have already removed from
original LRU, but uncharged page_cgroup contains pointer to memcg withou
PCG_USED bit. (This asynchronous LRU work is for improving performance.)
If PCG_USED bit is not set, page_cgroup will never be added to memcg's
LRU. So, about pages not on LRU, they never access stale pointer. Then,
what we have to take care of is page_cgroup _on_ LRU list. This patch
fixes this problem by making mem_cgroup_force_empty() visit all LRUs
before exiting its loop and guarantee there are no pages on its LRU.

Signed-off-by: Daisuke Nishimura
Acked-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke Nishimura
2010-01-17 04:15:39 +0800
de3fab393 vmscan: kswapd: don't retry balance_pgdat() if all zones are unreclaimable ... Browse Code »

Commit f50de2d3 (vmscan: have kswapd sleep for a short interval and double
check it should be asleep) can cause kswapd to enter an infinite loop if
running on a single-CPU system. If all zones are unreclaimble,
sleeping_prematurely return 1 and kswapd will call balance_pgdat() again.
but it's totally meaningless, balance_pgdat() doesn't anything against
unreclaimable zone!

Signed-off-by: KOSAKI Motohiro
Cc: Mel Gorman
Reported-by: Will Newton
Reviewed-by: Minchan Kim
Reviewed-by: Rik van Riel
Tested-by: Will Newton
Reviewed-by: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-01-17 04:15:39 +0800
d2dbe08dd mm/page_alloc: fix the range check for backward merging ... Browse Code »

The current check for 'backward merging' within add_active_range() does
not seem correct. start_pfn must be compared against
early_node_map[i].start_pfn (and NOT against .end_pfn) to find out whether
the new region is backward-mergeable with the existing range.

Signed-off-by: Kazuhisa Ichikawa
Acked-by: David Rientjes
Cc: KOSAKI Motohiro
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kazuhisa Ichikawa
2010-01-17 04:15:38 +0800

14 Jan, 2010

1 commit

cedabed49 vfs: Fix vmtruncate() regression ... Browse Code »

If __block_prepare_write() was failed in block_write_begin(), the
allocated blocks can be outside of ->i_size.

But new truncate_pagecache() in vmtuncate() does nothing if new < old.
It means the above usage is not working anymore.

So, this patch fixes it by removing "new < old" check. It would need
more cleanup/change. But, now -rc and truncate working is in progress,
so, this tried to fix it minimum change.

Acked-by: Nick Piggin
Signed-off-by: OGAWA Hirofumi
Signed-off-by: Linus Torvalds

OGAWA Hirofumi
2010-01-14 08:09:33 +0800

12 Jan, 2010

2 commits

74dbdd239 mm: hugetlb: fix clear_huge_page() ... Browse Code »

sz is in bytes, MAX_ORDER_NR_PAGES is in pages.

Signed-off-by: Andrea Arcangeli
Acked-by: David Gibson
Cc: Mel Gorman
Cc: David Rientjes
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2010-01-12 01:34:06 +0800
129182e56 percpu: avoid calling __pcpu_ptr_to_addr(NULL) ... Browse Code »

__pcpu_ptr_to_addr() can be overridden by the architecture and might not
behave well if passed a NULL pointer. So avoid calling it until we have
verified that its arg is not NULL.

Cc: Rusty Russell
Cc: Kamalesh Babulal
Acked-by: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2010-01-12 01:34:04 +0800

08 Jan, 2010

1 commit

6144a85a0 maccess,probe_kernel: Allow arch specific override probe_kernel_(read|write) ... Browse Code »

Some archs such as blackfin, would like to have an arch specific
probe_kernel_read() and probe_kernel_write() implementation which can
fall back to the generic implementation if no special operations are
needed.

CC: Thomas Gleixner
CC: Ingo Molnar
Signed-off-by: Jason Wessel
Signed-off-by: Mike Frysinger

Jason Wessel
2010-01-08 01:58:36 +0800

07 Jan, 2010

2 commits

7959722b9 NOMMU: Use copy_*_user_page() in access_process_vm() ... Browse Code »

The MMU code uses the copy_*_user_page() variants in access_process_vm()
rather than copy_*_user() as the former includes an icache flush. This
is important when doing things like setting software breakpoints with
gdb. So switch the NOMMU code over to do the same.

This patch makes the reasonable assumption that copy_from_user_page()
won't fail - which is probably fine, as we've checked the VMA from which
we're copying is usable, and the copy is not allowed to cross VMAs. The
one case where it might go wrong is if the VMA is a device rather than
RAM, and that device returns an error which - in which case rubbish will
be returned rather than EIO.

Signed-off-by: Jie Zhang
Signed-off-by: Mike Frysinger
Signed-off-by: David Howells
Acked-by: David McCullough
Acked-by: Paul Mundt
Acked-by: Greg Ungerer
Signed-off-by: Linus Torvalds

Jie Zhang
2010-01-07 10:16:02 +0800
cfe79c00a NOMMU: Avoiding duplicate icache flushes of shared maps ... Browse Code »

When working with FDPIC, there are many shared mappings of read-only
code regions between applications (the C library, applet packages like
busybox, etc.), but the current do_mmap_pgoff() function will issue an
icache flush whenever a VMA is added to an MM instead of only doing it
when the map is initially created.

The flush can instead be done when a region is first mmapped PROT_EXEC.
Note that we may not rely on the first mapping of a region being
executable - it's possible for it to be PROT_READ only, so we have to
remember whether we've flushed the region or not, and then flush the
entire region when a bit of it is made executable.

However, this also affects the brk area. That will no longer be
executable. We can mprotect() it to PROT_EXEC on MPU-mode kernels, but
for NOMMU mode kernels, when it increases the brk allocation, making
sys_brk() flush the extra from the icache should suffice. The brk area
probably isn't used by NOMMU programs since the brk area can only use up
the leavings from the stack allocation, where the stack allocation is
larger than requested.

Signed-off-by: David Howells
Signed-off-by: Mike Frysinger
Signed-off-by: Linus Torvalds

Mike Frysinger
2010-01-07 10:16:02 +0800

31 Dec, 2009

2 commits

f8e9766dd Merge branch 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 ... Browse Code »

* 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
SLAB: Fix lockdep annotation breakage

Linus Torvalds
2009-12-31 05:14:25 +0800
66f0dc481 mm: move sys_mmap_pgoff from util.c ... Browse Code »

Move sys_mmap_pgoff() from mm/util.c to mm/mmap.c and mm/nommu.c,
where we'd expect to find such code: especially now that it contains
the MAP_HUGETLB handling. Revert mm/util.c to how it was in 2.6.32.

This patch just ignores MAP_HUGETLB in the nommu case, as in 2.6.32,
whereas 2.6.33-rc2 reported -ENOSYS. Perhaps validate_mmap_request()
should reject it with -EINVAL? Add that later if necessary.

Signed-off-by: Hugh Dickins
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-12-31 04:23:27 +0800

29 Dec, 2009

1 commit

00afa7580 SLAB: Fix lockdep annotation breakage ... Browse Code »

Commit ce79ddc8e2376a9a93c7d42daf89bfcbb9187e62 ("SLAB: Fix lockdep annotations
for CPU hotplug") broke init_node_lock_keys() off-slab logic which causes
lockdep false positives.

Fix that up by reverting the logic back to original while keeping CPU hotplug
fixes intact.

Reported-and-tested-by: Heiko Carstens
Reported-and-tested-by: Andi Kleen
Signed-off-by: Pekka Enberg

Pekka Enberg
2009-12-29 02:57:27 +0800

25 Dec, 2009

2 commits

0b5e2588d Merge branch 'sysctl' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc-2.6 ... Browse Code »

* 'sysctl' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc-2.6:
SYSCTL: Add a mutex to the page_alloc zone order sysctl
SYSCTL: Print binary sysctl warnings (nearly) only once

Linus Torvalds
2009-12-25 05:01:29 +0800
6067d7e4f Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 ... Browse Code »

* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6:
HWPOISON: Add PROC_FS dependency to hwpoison injector v2

Linus Torvalds
2009-12-25 05:01:13 +0800

24 Dec, 2009

1 commit

443c6f145 SYSCTL: Add a mutex to the page_alloc zone order sysctl ... Browse Code »

The zone list code clearly cannot tolerate concurrent writers (I couldn't
find any locks for that), so simply add a global mutex. No need for RCU
in this case.

Signed-off-by: Andi Kleen

Andi Kleen
2009-12-24 04:01:16 +0800

23 Dec, 2009

1 commit

dd508ae2d Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc ... Browse Code »

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (36 commits)
powerpc/gc/wii: Remove get_irq_desc()
powerpc/gc/wii: hlwd-pic: convert irq_desc.lock to raw_spinlock
powerpc/gamecube/wii: Fix off-by-one error in ugecon/usbgecko_udbg
powerpc/mpic: Fix problem that affinity is not updated
powerpc/mm: Fix stupid bug in subpge protection handling
powerpc/iseries: use DECLARE_COMPLETION_ONSTACK for non-constant completion
powerpc: Fix MSI support on U4 bridge PCIe slot
powerpc: Handle VSX alignment faults correctly in little-endian mode
powerpc/mm: Fix typo of cpumask_clear_cpu()
powerpc/mm: Fix hash_utils_64.c compile errors with DEBUG enabled.
powerpc: Convert BUG() to use unreachable()
powerpc/pseries: Make declarations of cpu_hotplug_driver_lock() ANSI compatible.
powerpc/pseries: Don't panic when H_PROD fails during cpu-online.
powerpc/mm: Fix a WARN_ON() with CONFIG_DEBUG_PAGEALLOC and CONFIG_DEBUG_VM
powerpc/defconfigs: Set HZ=100 on pseries and ppc64 defconfigs
powerpc/defconfigs: Disable token ring in powerpc defconfigs
powerpc/defconfigs: Reduce 64bit vmlinux by making acenic and cramfs modules
powerpc/pseries: Select XICS and PCI_MSI PSERIES
powerpc/85xx: Wrong variable returned on error
powerpc/iseries: Convert to proc_fops
...

Linus Torvalds
2009-12-23 06:18:13 +0800

22 Dec, 2009

1 commit

27df5068e HWPOISON: Add PROC_FS dependency to hwpoison injector v2 ... Browse Code »

The injector filter requires stable_page_flags() which is supplied
by procfs. So make it dependent on that.

Also add ifdefs around the filter code in memory-failure.c so that
when the filter is disabled due to missing dependencies the whole
code still builds.

Reported-by: Ingo Molnar
Signed-off-by: Andi Kleen

Andi Kleen
2009-12-22 02:56:42 +0800

20 Dec, 2009

1 commit

3981e1528 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/… ... Browse Code »

…git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, irq: Allow 0xff for /proc/irq/[n]/smp_affinity on an 8-cpu system
Makefile: Unexport LC_ALL instead of clearing it
x86: Fix objdump version check in arch/x86/tools/chkobjdump.awk
x86: Reenable TSC sync check at boot, even with NONSTOP_TSC
x86: Don't use POSIX character classes in gen-insn-attr-x86.awk
Makefile: set LC_CTYPE, LC_COLLATE, LC_NUMERIC to C
x86: Increase MAX_EARLY_RES; insufficient on 32-bit NUMA
x86: Fix checking of SRAT when node 0 ram is not from 0
x86, cpuid: Add "volatile" to asm in native_cpuid()
x86, msr: msrs_alloc/free for CONFIG_SMP=n
x86, amd: Get multi-node CPU info from NodeId MSR instead of PCI config space
x86: Add IA32_TSC_AUX MSR and use it
x86, msr/cpuid: Register enough minors for the MSR and CPUID drivers
initramfs: add missing decompressor error check
bzip2: Add missing checks for malloc returning NULL
bzip2/lzma/gzip: pre-boot malloc doesn't return NULL on failure

Linus Torvalds
2009-12-20 01:48:14 +0800

18 Dec, 2009

5 commits

925cc71e5 mm: Add notifier in pageblock isolation for balloon drivers ... Browse Code »

Memory balloon drivers can allocate a large amount of memory which is not
movable but could be freed to accomodate memory hotplug remove.

Prior to calling the memory hotplug notifier chain the memory in the
pageblock is isolated. Currently, if the migrate type is not
MIGRATE_MOVABLE the isolation will not proceed, causing the memory removal
for that page range to fail.

Rather than failing pageblock isolation if the migrateteype is not
MIGRATE_MOVABLE, this patch checks if all of the pages in the pageblock,
and not on the LRU, are owned by a registered balloon driver (or other
entity) using a notifier chain. If all of the non-movable pages are owned
by a balloon, they can be freed later through the memory notifier chain
and the range can still be isolated in set_migratetype_isolate().

Signed-off-by: Robert Jennings
Cc: Mel Gorman
Cc: Ingo Molnar
Cc: Brian King
Cc: Paul Mackerras
Cc: Martin Schwidefsky
Cc: Gerald Schaefer
Cc: KAMEZAWA Hiroyuki
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Benjamin Herrenschmidt

Robert Jennings
2009-12-18 11:53:36 +0800
55db493b6 Merge branch 'cpumask-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git… ... Browse Code »

…/rusty/linux-2.6-for-linus

* 'cpumask-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
cpumask: rename tsk_cpumask to tsk_cpus_allowed
cpumask: don't recommend set_cpus_allowed hack in Documentation/cpu-hotplug.txt
cpumask: avoid dereferencing struct cpumask
cpumask: convert drivers/idle/i7300_idle.c to cpumask_var_t
cpumask: use modern cpumask style in drivers/scsi/fcoe/fcoe.c
cpumask: avoid deprecated function in mm/slab.c
cpumask: use cpu_online in kernel/perf_event.c

Linus Torvalds
2009-12-18 09:00:20 +0800
efc8e7f4c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorri… ... Browse Code »

…s/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
Keys: KEYCTL_SESSION_TO_PARENT needs TIF_NOTIFY_RESUME architecture support
NOMMU: Optimise away the {dac_,}mmap_min_addr tests
security/min_addr.c: make init_mmap_min_addr() static
keys: PTR_ERR return of wrong pointer in keyctl_get_security()

Linus Torvalds
2009-12-18 08:58:26 +0800
dcc7cd011 Merge branch 'kmemleak' of git://linux-arm.org/linux-2.6 ... Browse Code »

* 'kmemleak' of git://linux-arm.org/linux-2.6:
kmemleak: fix kconfig for crc32 build error
kmemleak: Reduce the false positives by checking for modified objects
kmemleak: Show the age of an unreferenced object
kmemleak: Release the object lock before calling put_object()
kmemleak: Scan the _ftrace_events section in modules
kmemleak: Simplify the kmemleak_scan_area() function prototype
kmemleak: Do not use off-slab management with SLAB_NOLEAKTRACE

Linus Torvalds
2009-12-18 08:00:19 +0800
65a80b4c6 readahead: add blk_run_backing_dev ... Browse Code »

I added blk_run_backing_dev on page_cache_async_readahead so readahead I/O
is unpluged to improve throughput on especially RAID environment.

The normal case is, if page N become uptodate at time T(N), then T(N)
Acked-by: Wu Fengguang
Cc: Jens Axboe
Cc: KOSAKI Motohiro
Tested-by: Ronald
Cc: Bart Van Assche
Cc: Vladislav Bolkhovitin
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hisashi Hifumi
2009-12-18 07:45:32 +0800

17 Dec, 2009

9 commits

58463c1fe cpumask: avoid deprecated function in mm/slab.c ... Browse Code »

These days we use cpumask_empty() which takes a pointer.

Signed-off-by: Rusty Russell
Acked-by: Christoph Lameter

Rusty Russell
2009-12-17 09:13:13 +0800
718deb6b6 Fix breakage in shmem.c ... Browse Code »

Replacing
error = 0;
if (error)
op
with nothing is not quite an equivalent transformation ;-)

Signed-off-by: Al Viro

Al Viro
2009-12-17 08:48:48 +0800
329962503 x86: Fix checking of SRAT when node 0 ram is not from 0 ... Browse Code »

Found one system that boot from socket1 instead of socket0, SRAT get rejected...

[ 0.000000] SRAT: Node 1 PXM 0 0-a0000
[ 0.000000] SRAT: Node 1 PXM 0 100000-80000000
[ 0.000000] SRAT: Node 1 PXM 0 100000000-2080000000
[ 0.000000] SRAT: Node 0 PXM 1 2080000000-4080000000
[ 0.000000] SRAT: Node 2 PXM 2 4080000000-6080000000
[ 0.000000] SRAT: Node 3 PXM 3 6080000000-8080000000
[ 0.000000] SRAT: Node 4 PXM 4 8080000000-a080000000
[ 0.000000] SRAT: Node 5 PXM 5 a080000000-c080000000
[ 0.000000] SRAT: Node 6 PXM 6 c080000000-e080000000
[ 0.000000] SRAT: Node 7 PXM 7 e080000000-10080000000
...
[ 0.000000] NUMA: Allocated memnodemap from 500000 - 701040
[ 0.000000] NUMA: Using 20 for the hash shift.
[ 0.000000] Adding active range (0, 0x2080000, 0x4080000) 0 entries of 3200 used
[ 0.000000] Adding active range (1, 0x0, 0x96) 1 entries of 3200 used
[ 0.000000] Adding active range (1, 0x100, 0x7f750) 2 entries of 3200 used
[ 0.000000] Adding active range (1, 0x100000, 0x2080000) 3 entries of 3200 used
[ 0.000000] Adding active range (2, 0x4080000, 0x6080000) 4 entries of 3200 used
[ 0.000000] Adding active range (3, 0x6080000, 0x8080000) 5 entries of 3200 used
[ 0.000000] Adding active range (4, 0x8080000, 0xa080000) 6 entries of 3200 used
[ 0.000000] Adding active range (5, 0xa080000, 0xc080000) 7 entries of 3200 used
[ 0.000000] Adding active range (6, 0xc080000, 0xe080000) 8 entries of 3200 used
[ 0.000000] Adding active range (7, 0xe080000, 0x10080000) 9 entries of 3200 used
[ 0.000000] SRAT: PXMs only cover 917504MB of your 1048566MB e820 RAM. Not used.
[ 0.000000] SRAT: SRAT not used.

the early_node_map is not sorted because node0 with non zero start come first.

so try to sort it right away after all regions are registered.

also fixs refression by 8716273c (x86: Export srat physical topology)

-v2: make it more solid to handle cross node case like node0 [0,4g), [8,12g) and node1 [4g, 8g), [12g, 16g)
-v3: update comments.

Reported-and-tested-by: Jens Axboe
Signed-off-by: Yinghai Lu
LKML-Reference:
Signed-off-by: H. Peter Anvin

Yinghai Lu
2009-12-17 08:43:37 +0800
6e1415467 NOMMU: Optimise away the {dac_,}mmap_min_addr tests ... Browse Code »

In NOMMU mode clamp dac_mmap_min_addr to zero to cause the tests on it to be
skipped by the compiler. We do this as the minimum mmap address doesn't make
any sense in NOMMU mode.

mmap_min_addr and round_hint_to_min() can be discarded entirely in NOMMU mode.

Signed-off-by: David Howells
Acked-by: Eric Paris
Signed-off-by: James Morris

David Howells
2009-12-17 06:25:19 +0800
d4220f987 Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 ... Browse Code »

* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (34 commits)
HWPOISON: Remove stray phrase in a comment
HWPOISON: Try to allocate migration page on the same node
HWPOISON: Don't do early filtering if filter is disabled
HWPOISON: Add a madvise() injector for soft page offlining
HWPOISON: Add soft page offline support
HWPOISON: Undefine short-hand macros after use to avoid namespace conflict
HWPOISON: Use new shake_page in memory_failure
HWPOISON: Use correct name for MADV_HWPOISON in documentation
HWPOISON: mention HWPoison in Kconfig entry
HWPOISON: Use get_user_page_fast in hwpoison madvise
HWPOISON: add an interface to switch off/on all the page filters
HWPOISON: add memory cgroup filter
memcg: add accessor to mem_cgroup.css
memcg: rename and export try_get_mem_cgroup_from_page()
HWPOISON: add page flags filter
mm: export stable page flags
HWPOISON: limit hwpoison injector to known page types
HWPOISON: add fs/device filters
HWPOISON: return 0 to indicate success reliably
HWPOISON: make semantics of IGNORED/DELAYED clear
...

Linus Torvalds
2009-12-17 04:36:49 +0800
bac5e54c2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (38 commits)
direct I/O fallback sync simplification
ocfs: stop using do_sync_mapping_range
cleanup blockdev_direct_IO locking
make generic_acl slightly more generic
sanitize xattr handler prototypes
libfs: move EXPORT_SYMBOL for d_alloc_name
vfs: force reval of target when following LAST_BIND symlinks (try #7)
ima: limit imbalance msg
Untangling ima mess, part 3: kill dead code in ima
Untangling ima mess, part 2: deal with counters
Untangling ima mess, part 1: alloc_file()
O_TRUNC open shouldn't fail after file truncation
ima: call ima_inode_free ima_inode_free
IMA: clean up the IMA counts updating code
ima: only insert at inode creation time
ima: valid return code from ima_inode_alloc
fs: move get_empty_filp() deffinition to internal.h
Sanitize exec_permission_lite()
Kill cached_lookup() and real_lookup()
Kill path_lookup_open()
...

Trivial conflicts in fs/direct-io.c

Linus Torvalds
2009-12-17 04:04:02 +0800
c05c4edd8 direct I/O fallback sync simplification ... Browse Code »

In the case of direct I/O falling back to buffered I/O we sync data
twice currently: once at the end of generic_file_buffered_write using
filemap_write_and_wait_range and once a little later in
__generic_file_aio_write using do_sync_mapping_range with all flags set.

The wait before write of the do_sync_mapping_range call does not make
any sense, so just keep the filemap_write_and_wait_range call and move
it to the right spot.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-12-17 01:16:50 +0800
1c7c474c3 make generic_acl slightly more generic ... Browse Code »

Now that we cache the ACL pointers in the generic inode all the generic_acl
cruft can go away and generic_acl.c can directly implement xattr handlers
dealing with the full Posix ACL semantics for in-memory filesystems.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2009-12-17 01:16:49 +0800
431547b3c sanitize xattr handler prototypes ... Browse Code »

Add a flags argument to struct xattr_handler and pass it to all xattr
handler methods. This allows using the same methods for multiple
handlers, e.g. for the ACL methods which perform exactly the same action
for the access and default ACLs, just using a different underlying
attribute. With a little more groundwork it'll also allow sharing the
methods for the regular user/trusted/secure handlers in extN, ocfs2 and
jffs2 like it's already done for xfs in this patch.

Also change the inode argument to the handlers to a dentry to allow
using the handlers mechnism for filesystems that require it later,
e.g. cifs.

[with GFS2 bits updated by Steven Whitehouse ]

Signed-off-by: Christoph Hellwig
Reviewed-by: James Morris
Acked-by: Joel Becker
Signed-off-by: Al Viro

Christoph Hellwig
2009-12-17 01:16:49 +0800