Eric Lee / smarc-fsl-linux-kernel

09 Dec, 2011

7 commits

1368edf06 mm: vmalloc: check for page allocation failure before vmlist insertion ... Browse Code »
1

Commit f5252e00 ("mm: avoid null pointer access in vm_struct via
/proc/vmallocinfo") adds newly allocated vm_structs to the vmlist after
it is fully initialised. Unfortunately, it did not check that
__vmalloc_area_node() successfully populated the area. In the event of
allocation failure, the vmalloc area is freed but the pointer to freed
memory is inserted into the vmlist leading to a a crash later in
get_vmalloc_info().

This patch adds a check for ____vmalloc_area_node() failure within
__vmalloc_node_range. It does not use "goto fail" as in the previous
error path as a warning was already displayed by __vmalloc_area_node()
before it called vfree in its failure path.

Credit goes to Luciano Chavez for doing all the real work of identifying
exactly where the problem was.

Signed-off-by: Mel Gorman
Reported-by: Luciano Chavez
Tested-by: Luciano Chavez
Reviewed-by: Rik van Riel
Acked-by: David Rientjes
Cc: [3.1.x+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2011-12-09 23:50:29 +0800
d02156388 mm: Ensure that pfn_valid() is called once per pageblock when reserving pageblocks ... Browse Code »
1

setup_zone_migrate_reserve() expects that zone->start_pfn starts at
pageblock_nr_pages aligned pfn otherwise we could access beyond an
existing memblock resulting in the following panic if
CONFIG_HOLES_IN_ZONE is not configured and we do not check pfn_valid:

IP: [] setup_zone_migrate_reserve+0xcd/0x180
*pdpt = 0000000000000000 *pde = f000ff53f000ff53
Oops: 0000 [#1] SMP
Pid: 1, comm: swapper Not tainted 3.0.7-0.7-pae #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
EIP: 0060:[] EFLAGS: 00010006 CPU: 0
EIP is at setup_zone_migrate_reserve+0xcd/0x180
EAX: 000c0000 EBX: f5801fc0 ECX: 000c0000 EDX: 00000000
ESI: 000c01fe EDI: 000c01fe EBP: 00140000 ESP: f2475f58
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 1, ti=f2474000 task=f2472cd0 task.ti=f2474000)
Call Trace:
[] __setup_per_zone_wmarks+0xec/0x160
[] setup_per_zone_wmarks+0xf/0x20
[] init_per_zone_wmark_min+0x27/0x86
[] do_one_initcall+0x2b/0x160
[] kernel_init+0xbe/0x157
[] kernel_thread_helper+0x6/0xd
Code: a5 39 f5 89 f7 0f 46 fd 39 cf 76 40 8b 03 f6 c4 08 74 32 eb 91 90 89 c8 c1 e8 0e 0f be 80 80 2f 86 c0 8b 14 85 60 2f 86 c0 89 c8 82 b4 12 00 00 c1 e0 05 03 82 ac 12 00 00 8b 00 f6 c4 08 0f
EIP: [] setup_zone_migrate_reserve+0xcd/0x180 SS:ESP 0068:f2475f58
CR2: 00000000000012b4

We crashed in pageblock_is_reserved() when accessing pfn 0xc0000 because
highstart_pfn = 0x36ffe.

The issue was introduced in 3.0-rc1 by 6d3163ce ("mm: check if any page
in a pageblock is reserved before marking it MIGRATE_RESERVE").

Make sure that start_pfn is always aligned to pageblock_nr_pages to
ensure that pfn_valid s always called at the start of each pageblock.
Architectures with holes in pageblocks will be correctly handled by
pfn_valid_within in pageblock_is_reserved.

Signed-off-by: Michal Hocko
Signed-off-by: Mel Gorman
Tested-by: Dang Bo
Reviewed-by: KAMEZAWA Hiroyuki
Cc: Andrea Arcangeli
Cc: David Rientjes
Cc: Arve Hjnnevg
Cc: KOSAKI Motohiro
Cc: John Stultz
Cc: Dave Hansen
Cc: [3.0+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-12-09 23:50:28 +0800
09761333e mm/migrate.c: pair unlock_page() and lock_page() when migrating huge pages ... Browse Code »

Avoid unlocking and unlocked page if we failed to lock it.

Signed-off-by: Hillf Danton
Cc: Naoya Horiguchi
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hillf Danton
2011-12-09 23:50:28 +0800
58a84aa92 thp: set compound tail page _count to zero ... Browse Code »
1

Commit 70b50f94f1644 ("mm: thp: tail page refcounting fix") keeps all
page_tail->_count zero at all times. But the current kernel does not
set page_tail->_count to zero if a 1GB page is utilized. So when an
IOMMU 1GB page is used by KVM, it wil result in a kernel oops because a
tail page's _count does not equal zero.

kernel BUG at include/linux/mm.h:386!
invalid opcode: 0000 [#1] SMP
Call Trace:
gup_pud_range+0xb8/0x19d
get_user_pages_fast+0xcb/0x192
? trace_hardirqs_off+0xd/0xf
hva_to_pfn+0x119/0x2f2
gfn_to_pfn_memslot+0x2c/0x2e
kvm_iommu_map_pages+0xfd/0x1c1
kvm_iommu_map_memslots+0x7c/0xbd
kvm_iommu_map_guest+0xaa/0xbf
kvm_vm_ioctl_assigned_device+0x2ef/0xa47
kvm_vm_ioctl+0x36c/0x3a2
do_vfs_ioctl+0x49e/0x4e4
sys_ioctl+0x5a/0x7c
system_call_fastpath+0x16/0x1b
RIP gup_huge_pud+0xf2/0x159

Signed-off-by: Youquan Song
Reviewed-by: Andrea Arcangeli
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Youquan Song
2011-12-09 23:50:28 +0800
1dfb059b9 thp: reduce khugepaged freezing latency ... Browse Code »

khugepaged can sometimes cause suspend to fail, requiring that the user
retry the suspend operation.

Use wait_event_freezable_timeout() instead of
schedule_timeout_interruptible() to avoid missing freezer wakeups. A
try_to_freeze() would have been needed in the khugepaged_alloc_hugepage
tight loop too in case of the allocation failing repeatedly, and
wait_event_freezable_timeout will provide it too.

khugepaged would still freeze just fine by trying again the next minute
but it's better if it freezes immediately.

Reported-by: Jiri Slaby
Signed-off-by: Andrea Arcangeli
Tested-by: Jiri Slaby
Cc: Tejun Heo
Cc: Oleg Nesterov
Cc: "Srivatsa S. Bhat"
Cc: "Rafael J. Wysocki"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-12-09 23:50:28 +0800
83aeeada7 vmscan: use atomic-long for shrinker batching ... Browse Code »

Use atomic-long operations instead of looping around cmpxchg().

[akpm@linux-foundation.org: massage atomic.h inclusions]
Signed-off-by: Konstantin Khlebnikov
Cc: Dave Chinner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2011-12-09 23:50:27 +0800
635697c66 vmscan: fix initial shrinker size handling ... Browse Code »
43

A shrinker function can return -1, means that it cannot do anything
without a risk of deadlock. For example prune_super() does this if it
cannot grab a superblock refrence, even if nr_to_scan=0. Currently we
interpret this -1 as a ULONG_MAX size shrinker and evaluate `total_scan'
according to this. So the next time around this shrinker can cause
really big pressure. Let's skip such shrinkers instead.

Also make total_scan signed, otherwise the check (total_scan < 0) below
never works.

Signed-off-by: Konstantin Khlebnikov
Cc: Dave Chinner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2011-12-09 23:50:27 +0800

05 Dec, 2011

1 commit

52cef1891 slab, lockdep: Fix silly bug ... Browse Code »

Commit 30765b92 ("slab, lockdep: Annotate the locks before using
them") moves the init_lock_keys() call from after g_cpucache_up =
FULL, to before it. And overlooks the fact that init_node_lock_keys()
tests for it and ignores everything !FULL.

Introduce a LATE stage and change the lockdep test to be
Cc: Pekka Enberg
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-12-05 16:44:00 +0800

30 Nov, 2011

1 commit

57db53b07 Merge branch 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux ... Browse Code »

* 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
slub: avoid potential NULL dereference or corruption
slub: use irqsafe_cpu_cmpxchg for put_cpu_partial
slub: move discard_slab out of node lock
slub: use correct parameter to add a page to partial list tail

Linus Torvalds
2011-11-30 03:13:22 +0800

29 Nov, 2011

1 commit

9b5a4d4f6 Merge branch 'for-3.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu ... Browse Code »

* 'for-3.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: explain why per_cpu_ptr_to_phys() is more complicated than necessary
percpu: fix chunk range calculation
percpu: rename pcpu_mem_alloc to pcpu_mem_zalloc

Linus Torvalds
2011-11-29 05:49:43 +0800

24 Nov, 2011

3 commits

bc6697d8a slub: avoid potential NULL dereference or corruption ... Browse Code »

show_slab_objects() can trigger NULL dereferences or memory corruption.

Another cpu can change its c->page to NULL or c->node to NUMA_NO_NODE
while we use them.

Use ACCESS_ONCE(c->page) and ACCESS_ONCE(c->node) to make sure this
cannot happen.

Acked-by: Christoph Lameter
Acked-by: David Rientjes
Signed-off-by: Eric Dumazet
Signed-off-by: Pekka Enberg

Eric Dumazet
2011-11-24 14:44:19 +0800
42d623a8c slub: use irqsafe_cpu_cmpxchg for put_cpu_partial ... Browse Code »

The cmpxchg must be irq safe. The fallback for this_cpu_cmpxchg only
disables preemption which results in per cpu partial page operation
potentially failing on non x86 platforms.

This patch fixes the following problem reported by Christian Kujau:

I seem to hit it with heavy disk & cpu IO is in progress on this
PowerBook
G4. Full dmesg & .config: http://nerdbynature.de/bits/3.2.0-rc1/oops/

I've enabled some debug options and now it really points to slub.c:2166

http://nerdbynature.de/bits/3.2.0-rc1/oops/oops4m.jpg

With debug options enabled I'm currently in the xmon debugger, not sure
what to make of it yet, I'll try to get something useful out of it :)

Reported-by: Christian Kujau
Tested-by: Christian Kujau
Acked-by: Eric Dumazet
Acked-by: David Rientjes
Signed-off-by: Christoph Lameter
Signed-off-by: Pekka Enberg

Christoph Lameter
2011-11-24 14:44:14 +0800
67589c714 percpu: explain why per_cpu_ptr_to_phys() is more complicated than necessary ... Browse Code »

Add comments about current per_cpu_ptr_to_phys implementation to
explain why the logic is more complicated than necessary.

-tj: relocated comment into kerneldoc comment

Signed-off-by: Dave Young
Signed-off-by: Tejun Heo

Dave Young
2011-11-24 00:20:53 +0800

23 Nov, 2011

3 commits

8ba8ed54d Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux ... Browse Code »

* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: remove vm_dirties and task->dirties
writeback: hard throttle 1000+ dd on a slow USB stick
mm: Make task in balance_dirty_pages() killable

Linus Torvalds
2011-11-23 00:22:48 +0800
a855b84c3 percpu: fix chunk range calculation ... Browse Code »
1

Percpu allocator recorded the cpus which map to the first and last
units in pcpu_first/last_unit_cpu respectively and used them to
determine the address range of a chunk - e.g. it assumed that the
first unit has the lowest address in a chunk while the last unit has
the highest address.

This simply isn't true. Groups in a chunk can have arbitrary positive
or negative offsets from the previous one and there is no guarantee
that the first unit occupies the lowest offset while the last one the
highest.

Fix it by actually comparing unit offsets to determine cpus occupying
the lowest and highest offsets. Also, rename pcu_first/last_unit_cpu
to pcpu_low/high_unit_cpu to avoid confusion.

The chunk address range is used to flush cache on vmalloc area
map/unmap and decide whether a given address is in the first chunk by
per_cpu_ptr_to_phys() and the bug was discovered by invalid
per_cpu_ptr_to_phys() translation for crash_note.

Kudos to Dave Young for tracking down the problem.

Signed-off-by: Tejun Heo
Reported-by: WANG Cong
Reported-by: Dave Young
Tested-by: Dave Young
LKML-Reference:
Cc: stable @kernel.org

Tejun Heo
2011-11-23 00:09:46 +0800
90459ce06 percpu: rename pcpu_mem_alloc to pcpu_mem_zalloc ... Browse Code »

Currently pcpu_mem_alloc() is implemented always return zeroed memory.
So rename it to make user like pcpu_get_pages_and_bitmap() know don't
reinit it.

Signed-off-by: Bob Liu
Reviewed-by: Pekka Enberg
Reviewed-by: Michal Hocko
Signed-off-by: Tejun Heo

Bob Liu
2011-11-23 00:09:41 +0800

18 Nov, 2011

2 commits

b68445238 Merge branch 'stable/for-linus-fixes-3.2' of git://git.kernel.org/pub/scm/linux/… ... Browse Code »

…kernel/git/konrad/xen

* 'stable/for-linus-fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen-gntalloc: signedness bug in add_grefs()
xen-gntalloc: integer overflow in gntalloc_ioctl_alloc()
xen-gntdev: integer overflow in gntdev_alloc_map()
xen:pvhvm: enable PVHVM VCPU placement when using more than 32 CPUs.
xen/balloon: Avoid OOM when requesting highmem
xen: Remove hanging references to CONFIG_XEN_PLATFORM_PCI
xen: map foreign pages for shared rings by updating the PTEs directly

Linus Torvalds
2011-11-18 23:18:07 +0800
15bd1cfb3 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-block:
block: add missed trace_block_plug
paride: fix potential information leak in pg_read()
bio: change some signed vars to unsigned
block: avoid unnecessary plug list flush
cciss: auto engage SCSI mid layer at driver load time
loop: cleanup set_status interface
include/linux/bio.h: use a static inline function for bio_integrity_clone()
loop: prevent information leak after failed read
block: Always check length of all iov entries in blk_rq_map_user_iov()
The Windows driver .inf disables ASPM on all cciss devices. Do the same.
backing-dev: ensure wakeup_timer is deleted
block: Revert "[SCSI] genhd: add a new attribute "alias" in gendisk"

Linus Torvalds
2011-11-18 19:34:35 +0800

17 Nov, 2011

3 commits

468e6a20a writeback: remove vm_dirties and task->dirties ... Browse Code »

They are not used any more.

Signed-off-by: Wu Fengguang

Wu Fengguang
2011-11-17 20:49:06 +0800
1df647197 writeback: hard throttle 1000+ dd on a slow USB stick ... Browse Code »

The sleep based balance_dirty_pages() can pause at most MAX_PAUSE=200ms
on every 1 4KB-page, which means it cannot throttle a task under
4KB/200ms=20KB/s. So when there are more than 512 dd writing to a
10MB/s USB stick, its bdi dirty pages could grow out of control.

Even if we can increase MAX_PAUSE, the minimal (task_ratelimit = 1)
means a limit of 4KB/s.

They can eventually be safeguarded by the global limit check
(nr_dirty < dirty_thresh). However if someone is also writing to an
HDD at the same time, it'll get poor HDD write performance.

We at least want to maintain good write performance for other devices
when one device is attacked by some "massive parallel" workload, or
suffers from slow write bandwidth, or somehow get stalled due to some
error condition (eg. NFS server not responding).

For a stalled device, we need to completely block its dirtiers, too,
before its bdi dirty pages grow all the way up to the global limit and
leave no space for the other functional devices.

So change the loop exit condition to

/*
* Always enforce global dirty limit; also enforce bdi dirty limit
* if the normal max_pause sleeps cannot keep things under control.
*/
if (nr_dirty < dirty_thresh &&
(bdi_dirty < bdi_thresh || bdi->dirty_ratelimit > 1))
break;

which can be further simplified to

if (task_ratelimit)
break;

Signed-off-by: Wu Fengguang

Wu Fengguang
2011-11-17 20:39:32 +0800
cd12909cb xen: map foreign pages for shared rings by updating the PTEs directly ... Browse Code »

When mapping a foreign page with xenbus_map_ring_valloc() with the
GNTTABOP_map_grant_ref hypercall, set the GNTMAP_contains_pte flag and
pass a pointer to the PTE (in init_mm).

After the page is mapped, the usual fault mechanism can be used to
update additional MMs. This allows the vmalloc_sync_all() to be
removed from alloc_vm_area().

Signed-off-by: David Vrabel
Acked-by: Andrew Morton
[v1: Squashed fix by Michal for no-mmu case]
Signed-off-by: Konrad Rzeszutek Wilk
Signed-off-by: Michal Simek

David Vrabel
2011-11-17 01:13:08 +0800

16 Nov, 2011

5 commits

499d05ecf mm: Make task in balance_dirty_pages() killable ... Browse Code »

There is no reason why task in balance_dirty_pages() shouldn't be killable
and it helps in recovering from some error conditions (like when filesystem
goes in error state and cannot accept writeback anymore but we still want to
kill processes using it to be able to unmount it).

There will be follow up patches to further abort the generic_perform_write()
and other filesystem write loops, to avoid large write + SIGKILL combination
exceeding the dirty limit and possibly strange OOM.

Reported-by: Kazuya Mio
Tested-by: Kazuya Mio
Reviewed-by: Neil Brown
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Jan Kara
Signed-off-by: Wu Fengguang

Jan Kara
2011-11-16 19:53:44 +0800
ea4039a34 hugetlb: release pages in the error path of hugetlb_cow() ... Browse Code »
1

If we fail to prepare an anon_vma, the {new, old}_page should be released,
or they will leak.

Signed-off-by: Hillf Danton
Reviewed-by: Andrea Arcangeli
Cc: Hugh Dickins
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hillf Danton
2011-11-16 08:41:52 +0800
5aecc85ab oom: do not kill tasks with oom_score_adj OOM_SCORE_ADJ_MIN ... Browse Code »

Commit c9f01245 ("oom: remove oom_disable_count") has removed the
oom_disable_count counter which has been used for early break out from
oom_badness so we could never select a task with oom_score_adj set to
OOM_SCORE_ADJ_MIN (oom disabled).

Now that the counter is gone we are always going through heuristics
calculation and we always return a non zero positive value. This means
that we can end up killing a task with OOM disabled because it is
indistinguishable from regular tasks with 1% resp. CAP_SYS_ADMIN tasks
with 3% usage of memory or tasks with oom_score_adj set but OOM enabled.

Let's break out early if the task should have OOM disabled.

Signed-off-by: Michal Hocko
Acked-by: David Rientjes
Acked-by: KOSAKI Motohiro
Cc: Oleg Nesterov
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-11-16 08:41:51 +0800
9ada19342 slub: move discard_slab out of node lock ... Browse Code »

Lockdep reports there is potential deadlock for slub node list_lock.
discard_slab() is called with the lock hold in unfreeze_partials(),
which could trigger a slab allocation, which could hold the lock again.

discard_slab() doesn't need hold the lock actually, if the slab is
already removed from partial list.

Acked-by: Christoph Lameter
Reported-and-tested-by: Yong Zhang
Reported-and-tested-by: Julie Sullivan
Signed-off-by: Shaohua Li
Signed-off-by: Pekka Enberg

Shaohua Li
2011-11-16 02:41:00 +0800
f64ae042d slub: use correct parameter to add a page to partial list tail ... Browse Code »

unfreeze_partials() needs add the page to partial list tail, since such page
hasn't too many free objects. We now explictly use DEACTIVATE_TO_TAIL for this,
while DEACTIVATE_TO_TAIL != 1. This will cause performance regression (eg, more
lock contention in node->list_lock) without below fix.

Signed-off-by: Shaohua Li
Acked-by: Christoph Lameter
Acked-by: David Rientjes
Signed-off-by: Pekka Enberg

Shaohua Li
2011-11-16 02:37:15 +0800

11 Nov, 2011

1 commit

7a401a972 backing-dev: ensure wakeup_timer is deleted ... Browse Code »
44

bdi_prune_sb() in bdi_unregister() attempts to removes the bdi links
from all super_blocks and then del_timer_sync() the writeback timer.

However, this can race with __mark_inode_dirty(), leading to
bdi_wakeup_thread_delayed() rearming the writeback timer on the bdi
we're unregistering, after we've called del_timer_sync().

This can end up with the bdi being freed with an active timer inside it,
as in the case of the following dump after the removal of an SD card.

Fix this by redoing the del_timer_sync() in bdi_destory().

------------[ cut here ]------------
WARNING: at /home/rabin/kernel/arm/lib/debugobjects.c:262 debug_print_object+0x9c/0xc8()
ODEBUG: free active (active state 0) object type: timer_list hint: wakeup_timer_fn+0x0/0x180
Modules linked in:
Backtrace:
[] (dump_backtrace+0x0/0x110) from [] (dump_stack+0x18/0x1c)
r6:c02bc638 r5:00000106 r4:c79f5d18 r3:00000000
[] (dump_stack+0x0/0x1c) from [] (warn_slowpath_common+0x54/0x6c)
[] (warn_slowpath_common+0x0/0x6c) from [] (warn_slowpath_fmt+0x38/0x40)
r8:20000013 r7:c780c6f0 r6:c031613c r5:c780c6f0 r4:c02b1b29
r3:00000009
[] (warn_slowpath_fmt+0x0/0x40) from [] (debug_print_object+0x9c/0xc8)
r3:c02b1b29 r2:c02bc662
[] (debug_print_object+0x0/0xc8) from [] (debug_check_no_obj_freed+0xac/0x1dc)
r6:c7964000 r5:00000001 r4:c7964000
[] (debug_check_no_obj_freed+0x0/0x1dc) from [] (kmem_cache_free+0x88/0x1f8)
[] (kmem_cache_free+0x0/0x1f8) from [] (blk_release_queue+0x70/0x78)
[] (blk_release_queue+0x0/0x78) from [] (kobject_release+0x70/0x84)
r5:c79641f0 r4:c796420c
[] (kobject_release+0x0/0x84) from [] (kref_put+0x68/0x80)
r7:00000083 r6:c74083d0 r5:c015289c r4:c796420c
[] (kref_put+0x0/0x80) from [] (kobject_put+0x48/0x5c)
r5:c79643b4 r4:c79641f0
[] (kobject_put+0x0/0x5c) from [] (blk_cleanup_queue+0x68/0x74)
r4:c7964000
[] (blk_cleanup_queue+0x0/0x74) from [] (mmc_blk_put+0x78/0xe8)
r5:00000000 r4:c794c400
[] (mmc_blk_put+0x0/0xe8) from [] (mmc_blk_release+0x24/0x38)
r5:c794c400 r4:c0322824
[] (mmc_blk_release+0x0/0x38) from [] (__blkdev_put+0xe8/0x170)
r5:c78d5e00 r4:c74083c0
[] (__blkdev_put+0x0/0x170) from [] (blkdev_put+0x11c/0x12c)
r8:c79f5f70 r7:00000001 r6:c74083d0 r5:00000083 r4:c74083c0
r3:00000000
[] (blkdev_put+0x0/0x12c) from [] (kill_block_super+0x60/0x6c)
r7:c7942300 r6:c79f4000 r5:00000083 r4:c74083c0
[] (kill_block_super+0x0/0x6c) from [] (deactivate_locked_super+0x44/0x70)
r6:c79f4000 r5:c031af64 r4:c794dc00 r3:c00b06c4
[] (deactivate_locked_super+0x0/0x70) from [] (deactivate_super+0x6c/0x70)
r5:c794dc00 r4:c794dc00
[] (deactivate_super+0x0/0x70) from [] (mntput_no_expire+0x188/0x194)
r5:c794dc00 r4:c7942300
[] (mntput_no_expire+0x0/0x194) from [] (sys_umount+0x2e4/0x310)
r6:c7942300 r5:00000000 r4:00000000 r3:00000000
[] (sys_umount+0x0/0x310) from [] (ret_fast_syscall+0x0/0x30)
---[ end trace e5c83c92ada51c76 ]---

Cc: stable@kernel.org
Signed-off-by: Rabin Vincent
Signed-off-by: Linus Walleij
Signed-off-by: Jens Axboe

Rabin Vincent
2011-11-11 20:29:04 +0800

07 Nov, 2011

3 commits

3a73dbbc9 writeback: fix uninitialized task_ratelimit ... Browse Code »

In balance_dirty_pages() task_ratelimit may be not initialized
(initialization skiped by goto pause), and then used when calling
tracing hook.

Fix it by moving the task_ratelimit assignment before goto pause.

Reported-by: Witold Baryluk
Signed-off-by: Wu Fengguang

Wu Fengguang
2011-11-07 19:19:28 +0800
32aaeffbd Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux ... Browse Code »

* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include
net: sch_generic remove redundant use of
net: inet_timewait_sock doesnt need
...

Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h

Linus Torvalds
2011-11-07 11:44:47 +0800
208bca086 Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux ... Browse Code »

* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: Add a 'reason' to wb_writeback_work
writeback: send work item to queue_io, move_expired_inodes
writeback: trace event balance_dirty_pages
writeback: trace event bdi_dirty_ratelimit
writeback: fix ppc compile warnings on do_div(long long, unsigned long)
writeback: per-bdi background threshold
writeback: dirty position control - bdi reserve area
writeback: control dirty pause time
writeback: limit max dirty pause time
writeback: IO-less balance_dirty_pages()
writeback: per task dirty rate limit
writeback: stabilize bdi->dirty_ratelimit
writeback: dirty rate control
writeback: add bg_threshold parameter to __bdi_update_bandwidth()
writeback: dirty position control
writeback: account per-bdi accumulated dirtied pages

Linus Torvalds
2011-11-07 11:02:23 +0800

05 Nov, 2011

1 commit

b4fdcb02f Merge branch 'for-3.2/core' of git://git.kernel.dk/linux-block ... Browse Code »

* 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits)
block: don't call blk_drain_queue() if elevator is not up
blk-throttle: use queue_is_locked() instead of lockdep_is_held()
blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
blk-throttle: Free up policy node associated with deleted rule
block: warn if tag is greater than real_max_depth.
block: make gendisk hold a reference to its queue
blk-flush: move the queue kick into
blk-flush: fix invalid BUG_ON in blk_insert_flush
block: Remove the control of complete cpu from bio.
block: fix a typo in the blk-cgroup.h file
block: initialize the bounce pool if high memory may be added later
block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
block: drop @tsk from attempt_plug_merge() and explain sync rules
block: make get_request[_wait]() fail if queue is dead
block: reorganize throtl_get_tg() and blk_throtl_bio()
block: reorganize queue draining
block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
block: pass around REQ_* flags instead of broken down booleans during request alloc/free
block: move blk_throtl prototypes to block/blk.h
block: fix genhd refcounting in blkio_policy_parse_and_set()
...

Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion
and making the request functions be of type "void" instead of "int" in
- drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c}
- drivers/staging/zram/zram_drv.c

Linus Torvalds
2011-11-05 08:06:58 +0800

03 Nov, 2011

9 commits

092f4c56c Merge branch 'akpm' (Andrew's incoming - part two) ... Browse Code »

Says Andrew:

"60 patches. That's good enough for -rc1 I guess. I have quite a lot
of detritus to be rechecked, work through maintainers, etc.

- most of the remains of MM
- rtc
- various misc
- cgroups
- memcg
- cpusets
- procfs
- ipc
- rapidio
- sysctl
- pps
- w1
- drivers/misc
- aio"

* akpm: (60 commits)
memcg: replace ss->id_lock with a rwlock
aio: allocate kiocbs in batches
drivers/misc/vmw_balloon.c: fix typo in code comment
drivers/misc/vmw_balloon.c: determine page allocation flag can_sleep outside loop
w1: disable irqs in critical section
drivers/w1/w1_int.c: multiple masters used same init_name
drivers/power/ds2780_battery.c: fix deadlock upon insertion and removal
drivers/power/ds2780_battery.c: add a nolock function to w1 interface
drivers/power/ds2780_battery.c: create central point for calling w1 interface
w1: ds2760 and ds2780, use ida for id and ida_simple_get() to get it
pps gpio client: add missing dependency
pps: new client driver using GPIO
pps: default echo function
include/linux/dma-mapping.h: add dma_zalloc_coherent()
sysctl: make CONFIG_SYSCTL_SYSCALL default to n
sysctl: add support for poll()
RapidIO: documentation update
drivers/net/rionet.c: fix ethernet address macros for LE platforms
RapidIO: fix potential null deref in rio_setup_device()
RapidIO: add mport driver for Tsi721 bridge
...

Linus Torvalds
2011-11-03 07:07:27 +0800
61600f578 mm/page_cgroup.c: quiet sparse noise ... Browse Code »

warning: symbol 'swap_cgroup_ctrl' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten
Cc: Paul Menage
Cc: Li Zefan
Acked-by: Balbir Singh
Cc: Daisuke Nishimura
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

H Hartley Sweeten
2011-11-03 07:07:00 +0800
4799401fe memcg: Fix race condition in memcg_check_events() with this_cpu usage ... Browse Code »

Various code in memcontrol.c () calls this_cpu_read() on the calculations
to be done from two different percpu variables, or does an open-coded
read-modify-write on a single percpu variable.

Disable preemption throughout these operations so that the writes go to
the correct palces.

[hannes@cmpxchg.org: added this_cpu to __this_cpu conversion]
Signed-off-by: Johannes Weiner
Signed-off-by: Steven Rostedt
Cc: Greg Thelen
Cc: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: Thomas Gleixner
Cc: Peter Zijlstra
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Steven Rostedt
2011-11-03 07:07:00 +0800
a61ed3cec memcg: close race between charge and putback ... Browse Code »

There is a potential race between a thread charging a page and another
thread putting it back to the LRU list:

charge: putback:
SetPageCgroupUsed SetPageLRU
PageLRU && add to memcg LRU PageCgroupUsed && add to memcg LRU

The order of setting one flag and checking the other is crucial, otherwise
the charge may observe !PageLRU while the putback observes !PageCgroupUsed
and the page is not linked to the memcg LRU at all.

Global memory pressure may fix this by trying to isolate and putback the
page for reclaim, where that putback would link it to the memcg LRU again.
Without that, the memory cgroup is undeletable due to a charge whose
physical page can not be found and moved out.

Signed-off-by: Johannes Weiner
Cc: Ying Han
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-11-03 07:07:00 +0800
9b272977e memcg: skip scanning active lists based on individual size ... Browse Code »

Reclaim decides to skip scanning an active list when the corresponding
inactive list is above a certain size in comparison to leave the assumed
working set alone while there are still enough reclaim candidates around.

The memcg implementation of comparing those lists instead reports whether
the whole memcg is low on the requested type of inactive pages,
considering all nodes and zones.

This can lead to an oversized active list not being scanned because of the
state of the other lists in the memcg, as well as an active list being
scanned while its corresponding inactive list has enough pages.

Not only is this wrong, it's also a scalability hazard, because the global
memory state over all nodes and zones has to be gathered for each memcg
and zone scanned.

Make these calculations purely based on the size of the two LRU lists
that are actually affected by the outcome of the decision.

Signed-off-by: Johannes Weiner
Reviewed-by: Rik van Riel
Cc: KOSAKI Motohiro
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Reviewed-by: Minchan Kim
Reviewed-by: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-11-03 07:07:00 +0800
0a619e587 memcg: do not expose uninitialized mem_cgroup_per_node to world ... Browse Code »

If somebody is touching data too early, it might be easier to diagnose a
problem when dereferencing NULL at mem->info.nodeinfo[node] than trying to
understand why mem_cgroup_per_zone is [un|partly]initialized.

Signed-off-by: Igor Mammedov
Acked-by: Michal Hocko
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Igor Mammedov
2011-11-03 07:07:00 +0800
715a5ee82 memcg: fix oom schedule_timeout() ... Browse Code »

Before calling schedule_timeout(), task state should be changed.

Signed-off-by: KAMEZAWA Hiroyuki
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2011-11-03 07:06:59 +0800
c0ff4b854 memcg: rename mem variable to memcg ... Browse Code »

The memcg code sometimes uses "struct mem_cgroup *mem" and sometimes uses
"struct mem_cgroup *memcg". Rename all mem variables to memcg in source
file.

Signed-off-by: Raghavendra K T
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Raghavendra K T
2011-11-03 07:06:59 +0800
ff7ee93f4 cgroup/kmemleak: Annotate alloc_page() for cgroup allocations ... Browse Code »

When the cgroup base was allocated with kmalloc, it was necessary to
annotate the variable with kmemleak_not_leak(). But because it has
recently been changed to be allocated with alloc_page() (which skips
kmemleak checks) causes a warning on boot up.

I was triggering this output:

allocated 8388608 bytes of page_cgroup
please try 'cgroup_disable=memory' option if you don't want memory cgroups
kmemleak: Trying to color unknown object at 0xf5840000 as Grey
Pid: 0, comm: swapper Not tainted 3.0.0-test #12
Call Trace:
[] ? printk+0x1d/0x1f^M
[] paint_ptr+0x4f/0x78
[] kmemleak_not_leak+0x58/0x7d
[] ? __rcu_read_unlock+0x9/0x7d
[] kmemleak_init+0x19d/0x1e9
[] start_kernel+0x346/0x3ec
[] ? loglevel+0x18/0x18
[] i386_start_kernel+0xaa/0xb0

After a bit of debugging I tracked the object 0xf840000 (and others) down
to the cgroup code. The change from allocating base with kmalloc to
alloc_page() has the base not calling kmemleak_alloc() which adds the
pointer to the object_tree_root, but kmemleak_not_leak() adds it to the
crt_early_log[] table. On kmemleak_init(), the entry is found in the
early_log[] but not the object_tree_root, and this error message is
displayed.

If alloc_page() fails then it defaults back to vmalloc() which still uses
the kmemleak_alloc() which makes us still need the kmemleak_not_leak()
call. The solution is to call the kmemleak_alloc() directly if the
alloc_page() succeeds.

Reviewed-by: Michal Hocko
Signed-off-by: Steven Rostedt
Acked-by: Catalin Marinas
Signed-off-by: Jonathan Nieder
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Steven Rostedt
2011-11-03 07:06:59 +0800