Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

04 Jul, 2012

1 commit

a3da2c691 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block bits from Jens Axboe:
"As vacation is coming up, thought I'd better get rid of my pending
changes in my for-linus branch for this iteration. It contains:

- Two patches for mtip32xx. Killing a non-compliant sysfs interface
and moving it to debugfs, where it belongs.

- A few patches from Asias. Two legit bug fixes, and one killing an
interface that is no longer in use.

- A patch from Jan, making the annoying partition ioctl warning a bit
less annoying, by restricting it to !CAP_SYS_RAWIO only.

- Three bug fixes for drbd from Lars Ellenberg.

- A fix for an old regression for umem, it hasn't really worked since
the plugging scheme was changed in 3.0.

- A few fixes from Tejun.

- A splice fix from Eric Dumazet, fixing an issue with pipe
resizing."

* 'for-linus' of git://git.kernel.dk/linux-block:
scsi: Silence unnecessary warnings about ioctl to partition
block: Drop dead function blk_abort_queue()
block: Mitigate lock unbalance caused by lock switching
block: Avoid missed wakeup in request waitqueue
umem: fix up unplugging
splice: fix racy pipe->buffers uses
drbd: fix null pointer dereference with on-congestion policy when diskless
drbd: fix list corruption by failing but already aborted reads
drbd: fix access of unallocated pages and kernel panic
xen/blkfront: Add WARN to deal with misbehaving backends.
blkcg: drop local variable @q from blkg_destroy()
mtip32xx: Create debugfs entries for troubleshooting
mtip32xx: Remove 'registers' and 'flags' from sysfs
blkcg: fix blkg_alloc() failure path
block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED
block: fix return value on cfq_init() failure
mtip32xx: Remove version.h header file inclusion
xen/blkback: Copy id field when doing BLKIF_DISCARD.

Linus Torvalds
2012-07-04 06:45:10 +0800

21 Jun, 2012

7 commits

c4c0e9e54 mm, mempolicy: fix mbind() to do synchronous migration ... Browse Code »

If the range passed to mbind() is not allocated on nodes set in the
nodemask, it migrates the pages to respect the constraint.

The final formal of migrate_pages() is a mode of type enum migrate_mode,
not a boolean. do_mbind() is currently passing "true" which is the
equivalent of MIGRATE_SYNC_LIGHT. This should instead be MIGRATE_SYNC
for synchronous page migration.

Signed-off-by: David Rientjes
Signed-off-by: Linus Torvalds

David Rientjes
2012-06-21 13:10:42 +0800
48c3b583b mm/memblock: fix overlapping allocation when doubling reserved array ... Browse Code »

__alloc_memory_core_early() asks memblock for a range of memory then try
to reserve it. If the reserved region array lacks space for the new
range, memblock_double_array() is called to allocate more space for the
array. If memblock is used to allocate memory for the new array it can
end up using a range that overlaps with the range originally allocated in
__alloc_memory_core_early(), leading to possible data corruption.

With this patch memblock_double_array() now calls memblock_find_in_range()
with a narrowed candidate range (in cases where the reserved.regions array
is being doubled) so any memory allocated will not overlap with the
original range that was being reserved. The range is narrowed by passing
in the starting address and size of the previously allocated range. Then
the range above the ending address is searched and if a candidate is not
found, the range below the starting address is searched.

Signed-off-by: Greg Pearson
Signed-off-by: Yinghai Lu
Acked-by: Tejun Heo
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Pearson
2012-06-21 05:39:36 +0800
eb4546bbb mm/memory.c: fix kernel-doc warnings ... Browse Code »

Fix kernel-doc warnings in mm/memory.c:

Warning(mm/memory.c:1377): No description found for parameter 'start'
Warning(mm/memory.c:1377): Excess function parameter 'address' description in 'zap_page_range'

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2012-06-21 05:39:36 +0800
dad7557eb mm: fix kernel-doc warnings ... Browse Code »

Fix kernel-doc warnings such as

Warning(../mm/page_cgroup.c:432): No description found for parameter 'id'
Warning(../mm/page_cgroup.c:432): Excess function parameter 'mem' description in 'swap_cgroup_record'

Signed-off-by: Wanpeng Li
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wanpeng Li
2012-06-21 05:39:36 +0800
e0897d75f mm, thp: print useful information when mmap_sem is unlocked in zap_pmd_range ... Browse Code »
2

Andrea asked for addr, end, vma->vm_start, and vma->vm_end to be emitted
when !rwsem_is_locked(&tlb->mm->mmap_sem). Otherwise, debugging the
underlying issue is more difficult.

Suggested-by: Andrea Arcangeli
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2012-06-21 05:39:35 +0800
3a981f482 memcg: fix use_hierarchy css_is_ancestor oops regression ... Browse Code »

If use_hierarchy is set, reclaim testing soon oopses in css_is_ancestor()
called from __mem_cgroup_same_or_subtree() called from page_referenced():
when processes are exiting, it's easy for mm_match_cgroup() to pass along
a NULL memcg coming from a NULL mm->owner.

Check for that in __mem_cgroup_same_or_subtree(). Return true or false?
False because we cannot know if it was in the hierarchy, but also false
because it's better not to count a reference from an exiting process.

Signed-off-by: Hugh Dickins
Acked-by: Johannes Weiner
Acked-by: Konstantin Khlebnikov
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-06-21 05:39:35 +0800
61eafb00d mm, oom: fix and cleanup oom score calculations ... Browse Code »

The divide in p->signal->oom_score_adj * totalpages / 1000 within
oom_badness() was causing an overflow of the signed long data type.

This adds both the root bias and p->signal->oom_score_adj before doing the
normalization which fixes the issue and also cleans up the calculation.

Tested-by: Dave Jones
Signed-off-by: David Rientjes
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2012-06-21 05:39:35 +0800

16 Jun, 2012

2 commits

9b15b817f swap: fix shmem swapping when more than 8 areas ... Browse Code »

Minchan Kim reports that when a system has many swap areas, and tmpfs
swaps out to the ninth or more, shmem_getpage_gfp()'s attempts to read
back the page cannot locate it, and the read fails with -ENOMEM.

Whoops. Yes, I blindly followed read_swap_header()'s pte_to_swp_entry(
swp_entry_to_pte()) technique for determining maximum usable swap
offset, without stopping to realize that that actually depends upon the
pte swap encoding shifting swap offset to the higher bits and truncating
it there. Whereas our radix_tree swap encoding leaves offset in the
lower bits: it's swap "type" (that is, index of swap area) that was
truncated.

Fix it by reducing the SWP_TYPE_SHIFT() in swapops.h, and removing the
broken radix_to_swp_entry(swp_to_radix_entry()) from read_swap_header().

This does not reduce the usable size of a swap area any further, it
leaves it as claimed when making the original commit: no change from 3.0
on x86_64, nor on i386 without PAE; but 3.0's 512GB is reduced to 128GB
per swapfile on i386 with PAE. It's not a change I would have risked
five years ago, but with x86_64 supported for ten years, I believe it's
appropriate now.

Hmm, and what if some architecture implements its swap pte with offset
encoded below type? That would equally break the maximum usable swap
offset check. Happily, they all follow the same tradition of encoding
offset above type, but I'll prepare a check on that for next.

Reported-and-Reviewed-and-Tested-by: Minchan Kim
Signed-off-by: Hugh Dickins
Cc: stable@vger.kernel.org [3.1, 3.2, 3.3, 3.4]
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-06-16 12:48:14 +0800
a95f9b6e0 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull core updates (RCU and locking) from Ingo Molnar:
"Most of the diffstat comes from the RCU slow boot regression fixes,
but there's also a debuggability improvements/fixes."

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
memblock: Document memblock_is_region_{memory,reserved}()
rcu: Precompute RCU_FAST_NO_HZ timer offsets
rcu: Move RCU_FAST_NO_HZ per-CPU variables to rcu_dynticks structure
rcu: Update RCU_FAST_NO_HZ tracing for lazy callbacks
rcu: RCU_FAST_NO_HZ detection of callback adoption
spinlock: Indicate that a lockup is only suspected
kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop()
panic: Make panic_on_oops configurable

Linus Torvalds
2012-06-16 07:52:35 +0800

14 Jun, 2012

1 commit

047fe3605 splice: fix racy pipe->buffers uses ... Browse Code »

Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
by splice_shrink_spd() called from vmsplice_to_pipe()

commit 35f3d14dbbc5 (pipe: add support for shrinking and growing pipes)
added capability to adjust pipe->buffers.

Problem is some paths don't hold pipe mutex and assume pipe->buffers
doesn't change for their duration.

Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
use it in place of pipe->buffers where appropriate.

splice_shrink_spd() loses its struct pipe_inode_info argument.

Reported-by: Dave Jones
Signed-off-by: Eric Dumazet
Cc: Jens Axboe
Cc: Alexander Viro
Cc: Tom Herbert
Cc: stable # 2.6.35
Tested-by: Dave Jones
Signed-off-by: Jens Axboe

Eric Dumazet
2012-06-14 03:16:42 +0800

09 Jun, 2012

1 commit

1e11ad8dc mm, oom: fix badness score underflow ... Browse Code »

If the privileges given to root threads (3% of allowable memory) or a
negative value of /proc/pid/oom_score_adj happen to exceed the amount of
rss of a thread, its badness score overflows as a result of commit
a7f638f999ff ("mm, oom: normalize oom scores to oom_score_adj scale only
for userspace").

Fix this by making the type signed and return 1, meaning the thread is
still eligible for kill, if the value is negative.

Reported-by: Dave Jones
Acked-by: Oleg Nesterov
Signed-off-by: David Rientjes
Signed-off-by: Linus Torvalds

David Rientjes
2012-06-09 06:07:35 +0800

08 Jun, 2012

2 commits

eab309494 memblock: Document memblock_is_region_{memory,reserved}() ... Browse Code »

At first glance one would think that memblock_is_region_memory()
and memblock_is_region_reserved() would be implemented in the
same way. Unfortunately they aren't and the former returns
whether the region specified is a subset of a memory bank while
the latter returns whether the region specified intersects with
reserved memory.

Document the two functions so that users aren't tempted to
make the implementation the same between them and to clarify the
purpose of the functions.

Signed-off-by: Stephen Boyd
Cc: Tejun Heo
Link: http://lkml.kernel.org/r/1337845521-32755-1-git-send-email-sboyd@codeaurora.org
Signed-off-by: Ingo Molnar

Stephen Boyd
2012-06-08 17:59:46 +0800
0142ef6cd shmem: replace_page must flush_dcache and others ... Browse Code »

Commit bde05d1ccd51 ("shmem: replace page if mapping excludes its zone")
is not at all likely to break for anyone, but it was an earlier version
from before review feedback was incorporated. Fix that up now.

* shmem_replace_page must flush_dcache_page after copy_highpage [akpm]
* Expand comment on why shmem_unuse_inode needs page_swapcount [akpm]
* Remove excess of VM_BUG_ONs from shmem_replace_page [wangcong]
* Check page_private matches swap before calling shmem_replace_page [hughd]
* shmem_replace_page allow for unexpected race in radix_tree lookup [hughd]

Signed-off-by: Hugh Dickins
Cc: Cong Wang
Cc: Christoph Hellwig
Cc: KAMEZAWA Hiroyuki
Cc: Alan Cox
Cc: Stephane Marchesin
Cc: Andi Kleen
Cc: Dave Airlie
Cc: Daniel Vetter
Cc: Rob Clark
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-06-08 05:43:54 +0800

05 Jun, 2012

3 commits

99becf132 Pull 'for-linus' branches of git://git.kernel.org/pub/scm/linux/kernel/git/viro/{signal,vfs} ... Browse Code »

Pull signal and vfs compile breakage fixes from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
fixups for signal breakage

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
nommu: fix compilation of nommu.c

Linus Torvalds
2012-06-05 06:09:08 +0800
ad1ed2937 nommu: fix compilation of nommu.c ... Browse Code »

Compiling 3.5-rc1 for nommu targets gives:

CC mm/nommu.o
mm/nommu.c: In function ‘sys_mmap_pgoff’:
mm/nommu.c:1489:2: error: ‘ret’ undeclared (first use in this function)
mm/nommu.c:1489:2: note: each undeclared identifier is reported only once for each function it appears in

It is trivially fixed by replacing 'ret' with the local variable that is
already defined for the return value 'retval'.

Signed-off-by: Greg Ungerer
Signed-off-by: Al Viro

Greg Ungerer
2012-06-05 05:17:31 +0800
a3fe778c7 Merge tag 'stable/frontswap.v16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm ... Browse Code »

Pull frontswap feature from Konrad Rzeszutek Wilk:
"Frontswap provides a "transcendent memory" interface for swap pages.
In some environments, dramatic performance savings may be obtained
because swapped pages are saved in RAM (or a RAM-like device) instead
of a swap disk. This tag provides the basic infrastructure along with
some changes to the existing backends."

Fix up trivial conflict in mm/Makefile due to removal of swap token code
changing a line next to the new frontswap entry.

This pull request came in before the merge window even opened, it got
delayed to after the merge window by me just wanting to make sure it had
actual users. Apparently IBM is using this on their embedded side, and
Jan Beulich says that it's already made available for SLES and OpenSUSE
users.

Also acked by Rik van Riel, and Konrad points to other people liking it
too. So in it goes.

By Dan Magenheimer (4) and Konrad Rzeszutek Wilk (2)
via Konrad Rzeszutek Wilk
* tag 'stable/frontswap.v16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm:
frontswap: s/put_page/store/g s/get_page/load
MAINTAINER: Add myself for the frontswap API
mm: frontswap: config and doc files
mm: frontswap: core frontswap functionality
mm: frontswap: core swap subsystem hooks and headers
mm: frontswap: add frontswap header file

Linus Torvalds
2012-06-05 03:28:45 +0800

04 Jun, 2012

2 commits

68e3e9262 Revert "mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks" ... Browse Code »

This reverts commit 5ceb9ce6fe9462a298bb2cd5c9f1ca6cb80a0199.

That commit seems to be the cause of the mm compation list corruption
issues that Dave Jones reported. The locking (or rather, absense
there-of) is dubious, as is the use of the 'page' variable once it has
been found to be outside the pageblock range.

So revert it for now, we can re-visit this for 3.6. If we even need to:
as Minchan Kim says, "The patch wasn't a bug fix and even test workload
was very theoretical".

Reported-and-tested-by: Dave Jones
Acked-by: Hugh Dickins
Acked-by: KOSAKI Motohiro
Acked-by: Minchan Kim
Cc: Bartlomiej Zolnierkiewicz
Cc: Kyungmin Park
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Linus Torvalds
2012-06-04 11:05:57 +0800
752dc185d mm: fix warning in __set_page_dirty_nobuffers ... Browse Code »

New tmpfs use of !PageUptodate pages for fallocate() is triggering the
WARNING: at mm/page-writeback.c:1990 when __set_page_dirty_nobuffers()
is called from migrate_page_copy() for compaction.

It is anomalous that migration should use __set_page_dirty_nobuffers()
on an address_space that does not participate in dirty and writeback
accounting; and this has also been observed to insert surprising dirty
tags into a tmpfs radix_tree, despite tmpfs not using tags at all.

We should probably give migrate_page_copy() a better way to preserve the
tag and migrate accounting info, when mapping_cap_account_dirty(). But
that needs some more work: so in the interim, avoid the warning by using
a simple SetPageDirty on PageSwapBacked pages.

Reported-and-tested-by: Dave Jones
Signed-off-by: Hugh Dickins
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-06-04 11:05:47 +0800

02 Jun, 2012

3 commits

af4f8ba31 Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux ... Browse Code »

Pull slab updates from Pekka Enberg:
"Mainly a bunch of SLUB fixes from Joonsoo Kim"

* 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
slub: use __SetPageSlab function to set PG_slab flag
slub: fix a memory leak in get_partial_node()
slub: remove unused argument of init_kmem_cache_node()
slub: fix a possible memory leak
Documentations: Fix slabinfo.c directory in vm/slub.txt
slub: fix incorrect return type of get_any_partial()

Linus Torvalds
2012-06-02 07:50:23 +0800
1193755ac Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs changes from Al Viro.
"A lot of misc stuff. The obvious groups:
* Miklos' atomic_open series; kills the damn abuse of
->d_revalidate() by NFS, which was the major stumbling block for
all work in that area.
* ripping security_file_mmap() and dealing with deadlocks in the
area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
general.
* ->encode_fh() switched to saner API; insane fake dentry in
mm/cleancache.c gone.
* assorted annotations in fs (endianness, __user)
* parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
* ->update_time() work from Josef.
* other bits and pieces all over the place.

Normally it would've been in two or three pull requests, but
signal.git stuff had eaten a lot of time during this cycle ;-/"

Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
'truncate_range' inode method was removed by the VM changes, the VFS
update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
to sparse fix added twice, with other changes nearby).

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
nfs: don't open in ->d_revalidate
vfs: retry last component if opening stale dentry
vfs: nameidata_to_filp(): don't throw away file on error
vfs: nameidata_to_filp(): inline __dentry_open()
vfs: do_dentry_open(): don't put filp
vfs: split __dentry_open()
vfs: do_last() common post lookup
vfs: do_last(): add audit_inode before open
vfs: do_last(): only return EISDIR for O_CREAT
vfs: do_last(): check LOOKUP_DIRECTORY
vfs: do_last(): make ENOENT exit RCU safe
vfs: make follow_link check RCU safe
vfs: do_last(): use inode variable
vfs: do_last(): inline walk_component()
vfs: do_last(): make exit RCU safe
vfs: split do_lookup()
Btrfs: move over to use ->update_time
fs: introduce inode operation ->update_time
reiserfs: get rid of resierfs_sync_super
reiserfs: mark the superblock as dirty a bit later
...

Linus Torvalds
2012-06-02 01:34:35 +0800
c3b2da314 fs: introduce inode operation ->update_time ... Browse Code »

Btrfs has to make sure we have space to allocate new blocks in order to modify
the inode, so updating time can fail. We've gotten around this by having our
own file_update_time but this is kind of a pain, and Christoph has indicated he
would like to make xfs do something different with atime updates. So introduce
->update_time, where we will deal with i_version an a/m/c time updates and
indicate which changes need to be made. The normal version just does what it
has always done, updates the time and marks the inode dirty, and then
filesystems can choose to do something different.

I've gone through all of the users of file_update_time and made them check for
errors with the exception of the fault code since it's complicated and I wasn't
quite sure what to do there, also Jan is going to be pushing the file time
updates into page_mkwrite for those who have it so that should satisfy btrfs and
make it not a big deal to check the file_update_time() return code in the
generic fault path. Thanks,

Signed-off-by: Josef Bacik

Josef Bacik
2012-06-02 00:07:25 +0800

01 Jun, 2012

9 commits

17d1587f5 unexport do_munmap() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-06-01 22:37:18 +0800
eb36c5873 new helper: vm_mmap_pgoff() ... Browse Code »

take it to mm/util.c, convert vm_mmap() to use of that one and
take it to mm/util.c as well, convert both sys_mmap_pgoff() to
use of vm_mmap_pgoff()

Signed-off-by: Al Viro

Al Viro
2012-06-01 22:37:18 +0800
dc982501d kill do_mmap() completely ... Browse Code »

just pull into vm_mmap()

Signed-off-by: Al Viro

Al Viro
2012-06-01 22:37:17 +0800
e3fc629d7 switch aio and shm to do_mmap_pgoff(), make do_mmap() static ... Browse Code »

after all, 0 bytes and 0 pages is the same thing...

Signed-off-by: Al Viro

Al Viro
2012-06-01 22:37:17 +0800
9ac4ed4bd move security_mmap_addr() to saner place ... Browse Code »

it really should be done by get_unmapped_area(); that cuts down on
the amount of callers considerably and it's the right place for
that stuff anyway.

Signed-off-by: Al Viro

Al Viro
2012-06-01 22:37:16 +0800
8b3ec6814 take security_mmap_file() outside of ->mmap_sem ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-06-01 22:37:01 +0800
08615d7d8 Merge branch 'akpm' (Andrew's patch-bomb) ... Browse Code »

Merge misc patches from Andrew Morton:

- the "misc" tree - stuff from all over the map

- checkpatch updates

- fatfs

- kmod changes

- procfs

- cpumask

- UML

- kexec

- mqueue

- rapidio

- pidns

- some checkpoint-restore feature work. Reluctantly. Most of it
delayed a release. I'm still rather worried that we don't have a
clear roadmap to completion for this work.

* emailed from Andrew Morton : (78 patches)
kconfig: update compression algorithm info
c/r: prctl: add ability to set new mm_struct::exe_file
c/r: prctl: extend PR_SET_MM to set up more mm_struct entries
c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat
syscalls, x86: add __NR_kcmp syscall
fs, proc: introduce /proc//task//children entry
sysctl: make kernel.ns_last_pid control dependent on CHECKPOINT_RESTORE
aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector()
eventfd: change int to __u64 in eventfd_signal()
fs/nls: add Apple NLS
pidns: make killed children autoreap
pidns: use task_active_pid_ns in do_notify_parent
rapidio/tsi721: add DMA engine support
rapidio: add DMA engine support for RIO data transfers
ipc/mqueue: add rbtree node caching support
tools/selftests: add mq_perf_tests
ipc/mqueue: strengthen checks on mqueue creation
ipc/mqueue: correct mq_attr_ok test
ipc/mqueue: improve performance of send/recv
selftests: add mq_open_tests
...

Linus Torvalds
2012-06-01 09:10:18 +0800
ac34ebb3a aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector() ... Browse Code »

A cleanup of rw_copy_check_uvector and compat_rw_copy_check_uvector after
changes made to support CMA in an earlier patch.

Rather than having an additional check_access parameter to these
functions, the first paramater type is overloaded to allow the caller to
specify CHECK_IOVEC_ONLY which means check that the contents of the iovec
are valid, but do not check the memory that they point to. This is used
by process_vm_readv/writev where we need to validate that a iovec passed
to the syscall is valid but do not want to check the memory that it points
to at this point because it refers to an address space in another process.

Signed-off-by: Chris Yeoh
Reviewed-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christopher Yeoh
2012-06-01 08:49:32 +0800
e5467859f split ->file_mmap() into ->mmap_addr()/->mmap_file() ... Browse Code »

... i.e. file-dependent and address-dependent checks.

Signed-off-by: Al Viro

Al Viro
2012-06-01 01:11:54 +0800

31 May, 2012

3 commits

cf74d14c4 unexport do_mmap() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-05-31 09:04:57 +0800
63a81db13 merge do_mremap() into sys_mremap() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-05-31 09:04:57 +0800
3ed37648e fs: move file_remove_suid() to fs/inode.c ... Browse Code »

file_remove_suid() is a generic function operates on struct file,
it almost has no relations with file mapping, so move it to fs/inode.c.

Cc: Alexander Viro
Signed-off-by: Cong Wang
Signed-off-by: Al Viro

Cong Wang
2012-05-31 09:04:52 +0800

30 May, 2012

6 commits

4523e1458 mm: fix vma_resv_map() NULL pointer ... Browse Code »

hugetlb_reserve_pages() can be used for either normal file-backed
hugetlbfs mappings, or MAP_HUGETLB. In the MAP_HUGETLB, semi-anonymous
mode, there is not a VMA around. The new call to resv_map_put() assumed
that there was, and resulted in a NULL pointer dereference:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: vma_resv_map+0x9/0x30
PGD 141453067 PUD 1421e1067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
...
Pid: 14006, comm: trinity-child6 Not tainted 3.4.0+ #36
RIP: vma_resv_map+0x9/0x30
...
Process trinity-child6 (pid: 14006, threadinfo ffff8801414e0000, task ffff8801414f26b0)
Call Trace:
resv_map_put+0xe/0x40
hugetlb_reserve_pages+0xa6/0x1d0
hugetlb_file_setup+0x102/0x2c0
newseg+0x115/0x360
ipcget+0x1ce/0x310
sys_shmget+0x5a/0x60
system_call_fastpath+0x16/0x1b

This was reported by Dave Jones, but was reproducible with the
libhugetlbfs test cases, so shame on me for not running them in the
first place.

With this, the oops is gone, and the output of libhugetlbfs's
run_tests.py is identical to plain 3.4 again.

[ Marked for stable, since this was introduced by commit c50ac050811d
("hugetlb: fix resv_map leak in error path") which was also marked for
stable ]

Reported-by: Dave Jones
Cc: Mel Gorman
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: Andrea Arcangeli
Cc: Andrew Morton
Cc: [2.6.32+]
Signed-off-by: Linus Torvalds

Dave Hansen
2012-05-30 23:48:13 +0800
b0b0382bb ->encode_fh() API change ... Browse Code »

pass inode + parent's inode or NULL instead of dentry + bool saying
whether we want the parent or not.

NOTE: that needs ceph fix folded in.

Signed-off-by: Al Viro

Al Viro
2012-05-30 11:28:33 +0800
3f1346193 memcg: decrement static keys at real destroy time ... Browse Code »

We call the destroy function when a cgroup starts to be removed, such as
by a rmdir event.

However, because of our reference counters, some objects are still
inflight. Right now, we are decrementing the static_keys at destroy()
time, meaning that if we get rid of the last static_key reference, some
objects will still have charges, but the code to properly uncharge them
won't be run.

This becomes a problem specially if it is ever enabled again, because now
new charges will be added to the staled charges making keeping it pretty
much impossible.

We just need to be careful with the static branch activation: since there
is no particular preferred order of their activation, we need to make sure
that we only start using it after all call sites are active. This is
achieved by having a per-memcg flag that is only updated after
static_key_slow_inc() returns. At this time, we are sure all sites are
active.

This is made per-memcg, not global, for a reason: it also has the effect
of making socket accounting more consistent. The first memcg to be
limited will trigger static_key() activation, therefore, accounting. But
all the others will then be accounted no matter what. After this patch,
only limited memcgs will have its sockets accounted.

[akpm@linux-foundation.org: move enum sock_flag_bits into sock.h,
document enum sock_flag_bits,
convert memcg_proto_active() and memcg_proto_activated() to test_bit(),
redo tcp_update_limit() comment to 80 cols]
Signed-off-by: Glauber Costa
Cc: Tejun Heo
Cc: Li Zefan
Acked-by: KAMEZAWA Hiroyuki
Cc: Johannes Weiner
Cc: Michal Hocko
Acked-by: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Glauber Costa
2012-05-30 07:22:28 +0800
3afe36b1f memcg: always free struct memcg through schedule_work() ... Browse Code »

Right now we free struct memcg with kfree right after a rcu grace period,
but defer it if we need to use vfree() to get rid of that memory area. We
do that by need, because we need vfree to be called in a process context.

This patch unifies this behavior, by ensuring that even kfree will happen
in a separate thread. The goal is to have a stable place to call the
upcoming jump label destruction function outside the realm of the
complicated and quite far-reaching cgroup lock (that can't be held when
holding either the cpu_hotplug.lock or jump_label_mutex)

[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Glauber Costa
Cc: Tejun Heo
Cc: Li Zefan
Acked-by: KAMEZAWA Hiroyuki
Cc: Johannes Weiner
Acked-by: Michal Hocko
Cc: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Glauber Costa
2012-05-30 07:22:28 +0800
fa9add641 mm/memcg: apply add/del_page to lruvec ... Browse Code »

Take lruvec further: pass it instead of zone to add_page_to_lru_list() and
del_page_from_lru_list(); and pagevec_lru_move_fn() pass lruvec down to
its target functions.

This cleanup eliminates a swathe of cruft in memcontrol.c, including
mem_cgroup_lru_add_list(), mem_cgroup_lru_del_list() and
mem_cgroup_lru_move_lists() - which never actually touched the lists.

In their place, mem_cgroup_page_lruvec() to decide the lruvec, previously
a side-effect of add, and mem_cgroup_update_lru_size() to maintain the
lru_size stats.

Whilst these are simplifications in their own right, the goal is to bring
the evaluation of lruvec next to the spin_locking of the lrus, in
preparation for a future patch.

Signed-off-by: Hugh Dickins
Cc: KOSAKI Motohiro
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Michal Hocko
Acked-by: Konstantin Khlebnikov
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-05-30 07:22:28 +0800
75b00af77 mm: trivial cleanups in vmscan.c ... Browse Code »

Utter trivia in mm/vmscan.c, mostly just reducing the linecount slightly;
most exciting change being get_scan_count() calling vmscan_swappiness()
once instead of twice.

Signed-off-by: Hugh Dickins
Acked-by: KOSAKI Motohiro
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Konstantin Khlebnikov
Reviewed-by: Michal Hocko
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2012-05-30 07:22:28 +0800