Eric Lee / smarc-fsl-linux-kernel

15 Mar, 2013

1 commit

a2362d247 mm/fremap.c: fix possible oops on error path ... Browse Code »

The vm_flags introduced in 6d7825b10dbe ("mm/fremap.c: fix oops on error
path") is supposed to avoid a compiler warning about unitialized
vm_flags without changing the generated code.

However I am concerned that this is going to be very brittle, and fail
with some compiler versions. The failure could be either of:

- compiler could actually load vma->vm_flags before checking for the
!vma condition, thus reintroducing the oops

- compiler could optimize out the !vma check, since the pointer just got
dereferenced shortly before (so the compiler knows it can't be NULL!)

I propose reversing this part of the change and initializing vm_flags to 0
just to avoid the bogus uninitialized use warning.

Signed-off-by: Michel Lespinasse
Cc: Tommi Rantala
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2013-03-15 08:00:39 +0800

14 Mar, 2013

2 commits

6d7825b10 mm/fremap.c: fix oops on error path ... Browse Code »

If find_vma() fails, sys_remap_file_pages() will dereference `vma', which
contains NULL. Fix it by checking the pointer.

(We could alternatively check for err==0, but this seems more direct)

(The vm_flags change is to squish a bogus used-uninitialised warning
without adding extra code).

Reported-by: Tommi Rantala
Cc: Michel Lespinasse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2013-03-14 06:21:47 +0800
f8749452a mm: remove_memory(): fix end_pfn setting ... Browse Code »

remove_memory() calls walk_memory_range() with [start_pfn, end_pfn), where
end_pfn is exclusive in this range. Therefore, end_pfn needs to be set to
the next page of the end address.

Signed-off-by: Toshi Kani
Cc: Wen Congyang
Cc: Tang Chen
Cc: Kamezawa Hiroyuki
Cc: KOSAKI Motohiro
Cc: Jiang Liu
Cc: Jianguo Wu
Cc: Lai Jiangshan
Cc: Wu Jianguo
Cc: Yasuaki Ishimatsu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Toshi Kani
2013-03-14 06:21:44 +0800

13 Mar, 2013

2 commits

4febd95a8 Select VIRT_TO_BUS directly where needed ... Browse Code »

In commit 887cbce0adea ("arch Kconfig: centralise ARCH_NO_VIRT_TO_BUS")
I introduced the config sybmol HAVE_VIRT_TO_BUS and selected that where
needed. I am not sure what I was thinking. Instead, just directly
select VIRT_TO_BUS where it is needed.

Signed-off-by: Stephen Rothwell
Signed-off-by: Linus Torvalds

Stephen Rothwell
2013-03-13 02:16:40 +0800
8aec0f5d4 Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys ... Browse Code »

Looking at mm/process_vm_access.c:process_vm_rw() and comparing it to
compat_process_vm_rw() shows that the compatibility code requires an
explicit "access_ok()" check before calling
compat_rw_copy_check_uvector(). The same difference seems to appear when
we compare fs/read_write.c:do_readv_writev() to
fs/compat.c:compat_do_readv_writev().

This subtle difference between the compat and non-compat requirements
should probably be debated, as it seems to be error-prone. In fact,
there are two others sites that use this function in the Linux kernel,
and they both seem to get it wrong:

Now shifting our attention to fs/aio.c, we see that aio_setup_iocb()
also ends up calling compat_rw_copy_check_uvector() through
aio_setup_vectored_rw(). Unfortunately, the access_ok() check appears to
be missing. Same situation for
security/keys/compat.c:compat_keyctl_instantiate_key_iov().

I propose that we add the access_ok() check directly into
compat_rw_copy_check_uvector(), so callers don't have to worry about it,
and it therefore makes the compat call code similar to its non-compat
counterpart. Place the access_ok() check in the same location where
copy_from_user() can trigger a -EFAULT error in the non-compat code, so
the ABI behaviors are alike on both compat and non-compat.

While we are here, fix compat_do_readv_writev() so it checks for
compat_rw_copy_check_uvector() negative return values.

And also, fix a memory leak in compat_keyctl_instantiate_key_iov() error
handling.

Acked-by: Linus Torvalds
Acked-by: Al Viro
Signed-off-by: Mathieu Desnoyers
Signed-off-by: Linus Torvalds

Mathieu Desnoyers
2013-03-13 02:05:45 +0800

09 Mar, 2013

4 commits

15cf17d26 memcg: initialize kmem-cache destroying work earlier ... Browse Code »

Fix a warning from lockdep caused by calling cancel_work_sync() for
uninitialized struct work. This path has been triggered by destructon
kmem-cache hierarchy via destroying its root kmem-cache.

cache ffff88003c072d80
obj ffff88003b410000 cache ffff88003c072d80
obj ffff88003b924000 cache ffff88003c20bd40
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
Pid: 2825, comm: insmod Tainted: G O 3.9.0-rc1-next-20130307+ #611
Call Trace:
__lock_acquire+0x16a2/0x1cb0
lock_acquire+0x8a/0x120
flush_work+0x38/0x2a0
__cancel_work_timer+0x89/0xf0
cancel_work_sync+0xb/0x10
kmem_cache_destroy_memcg_children+0x81/0xb0
kmem_cache_destroy+0xf/0xe0
init_module+0xcb/0x1000 [kmem_test]
do_one_initcall+0x11a/0x170
load_module+0x19b0/0x2320
SyS_init_module+0xc6/0xf0
system_call_fastpath+0x16/0x1b

Example module to demonstrate:

#include
#include
#include
#include

int __init mod_init(void)
{
int size = 256;
struct kmem_cache *cache;
void *obj;
struct page *page;

cache = kmem_cache_create("kmem_cache_test", size, size, 0, NULL);
if (!cache)
return -ENOMEM;

printk("cache %p\n", cache);

obj = kmem_cache_alloc(cache, GFP_KERNEL);
if (obj) {
page = virt_to_head_page(obj);
printk("obj %p cache %p\n", obj, page->slab_cache);
kmem_cache_free(cache, obj);
}

flush_scheduled_work();

obj = kmem_cache_alloc(cache, GFP_KERNEL);
if (obj) {
page = virt_to_head_page(obj);
printk("obj %p cache %p\n", obj, page->slab_cache);
kmem_cache_free(cache, obj);
}

kmem_cache_destroy(cache);

return -EBUSY;
}

module_init(mod_init);
MODULE_LICENSE("GPL");

Signed-off-by: Konstantin Khlebnikov
Cc: Glauber Costa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2013-03-09 07:05:34 +0800
d8fc16a82 ksm: fix m68k build: only NUMA needs pfn_to_nid ... Browse Code »

A CONFIG_DISCONTIGMEM=y m68k config gave

mm/ksm.c: In function `get_kpfn_nid':
mm/ksm.c:492: error: implicit declaration of function `pfn_to_nid'

linux/mmzone.h declares it for CONFIG_SPARSEMEM and CONFIG_FLATMEM, but
expects the arch's asm/mmzone.h to declare it for CONFIG_DISCONTIGMEM
(see arch/mips/include/asm/mmzone.h for example).

Or perhaps it is only expected when CONFIG_NUMA=y: too much of a maze,
and m68k got away without it so far, so fix the build in mm/ksm.c.

Signed-off-by: Hugh Dickins
Reported-by: Geert Uytterhoeven
Cc: Petr Holasek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2013-03-09 07:05:34 +0800
7880639c3 mm/mempolicy.c: fix sp_node_init() argument ordering ... Browse Code »

Currently, n_new is wrongly initialized. start and end parameter are
inverted. Let's fix it.

Signed-off-by: KOSAKI Motohiro
Cc: Hillf Danton
Cc: Sasha Levin
Cc: Hugh Dickins
Cc: Mel Gorman
Cc: Dave Jones
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2013-03-09 07:05:34 +0800
5ca395751 mm/mempolicy.c: fix wrong sp_node insertion ... Browse Code »

n->end is accessed in sp_insert(). Thus it should be update
before calling sp_insert(). This mistake may make kernel panic.

Signed-off-by: Hillf Danton
Signed-off-by: KOSAKI Motohiro
Cc: Sasha Levin
Cc: Hugh Dickins
Cc: Mel Gorman
Cc: Dave Jones
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hillf Danton
2013-03-09 07:05:34 +0800

04 Mar, 2013

1 commit

56a79b7b0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more VFS bits from Al Viro:
"Unfortunately, it looks like xattr series will have to wait until the
next cycle ;-/

This pile contains 9p cleanups and fixes (races in v9fs_fid_add()
etc), fixup for nommu breakage in shmem.c, several cleanups and a bit
more file_inode() work"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
constify path_get/path_put and fs_struct.c stuff
fix nommu breakage in shmem.c
cache the value of file_inode() in struct file
9p: if v9fs_fid_lookup() gets to asking server, it'd better have hashed dentry
9p: make sure ->lookup() adds fid to the right dentry
9p: untangle ->lookup() a bit
9p: double iput() in ->lookup() if d_materialise_unique() fails
9p: v9fs_fid_add() can't fail now
v9fs: get rid of v9fs_dentry
9p: turn fid->dlist into hlist
9p: don't bother with private lock in ->d_fsdata; dentry->d_lock will do just fine
more file_inode() open-coded instances
selinux: opened file can't have NULL or negative ->f_path.dentry

(In the meantime, the hlist traversal macros have changed, so this
required a semantic conflict fixup for the newly hlistified fid->dlist)

Linus Torvalds
2013-03-04 05:23:03 +0800

03 Mar, 2013

1 commit

20e6926dc x86, ACPI, mm: Revert movablemem_map support ... Browse Code »

Tim found:

WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
Hardware name: S2600CP
sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
smpboot: Booting Node 1, Processors #1
Modules linked in:
Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
Call Trace:
set_cpu_sibling_map+0x279/0x449
start_secondary+0x11d/0x1e5

Don Morris reproduced on a HP z620 workstation, and bisected it to
commit e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock
is ready")

It turns out movable_map has some problems, and it breaks several things

1. numa_init is called several times, NOT just for srat. so those
nodes_clear(numa_nodes_parsed)
memset(&numa_meminfo, 0, sizeof(numa_meminfo))
can not be just removed. Need to consider sequence is: numaq, srat, amd, dummy.
and make fall back path working.

2. simply split acpi_numa_init to early_parse_srat.
a. that early_parse_srat is NOT called for ia64, so you break ia64.
b. for (i = 0; i < MAX_LOCAL_APIC; i++)
set_apicid_to_node(i, NUMA_NO_NODE)
still left in numa_init. So it will just clear result from early_parse_srat.
it should be moved before that....
c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
early before override from INITRD is settled.

3. that patch TITLE is total misleading, there is NO x86 in the title,
but it changes critical x86 code. It caused x86 guys did not
pay attention to find the problem early. Those patches really should
be routed via tip/x86/mm.

4. after that commit, following range can not use movable ram:
a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
b. initrd... it will be freed after booting, so it could be on movable...
c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
anymore.
d. init_mem_mapping: can not put page table high anymore.
e. initmem_init: vmemmap can not be high local node anymore. That is
not good.

If node is hotplugable, the mem related range like page table and
vmemmap could be on the that node without problem and should be on that
node.

We have workaround patch that could fix some problems, but some can not
be fixed.

So just remove that offending commit and related ones including:

f7210e6c4ac7 ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
protect movablecore_map in memblock_overlaps_region().")

01a178a94e8e ("acpi, memory-hotplug: support getting hotplug info from
SRAT")

27168d38fa20 ("acpi, memory-hotplug: extend movablemem_map ranges to
the end of node")

e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock is
ready")

fb06bc8e5f42 ("page_alloc: bootmem limit with movablecore_map")

42f47e27e761 ("page_alloc: make movablemem_map have higher priority")

6981ec31146c ("page_alloc: introduce zone_movable_limit[] to keep
movable limit for nodes")

34b71f1e04fc ("page_alloc: add movable_memmap kernel parameter")

4d59a75125d5 ("x86: get pg_data_t's memory from other node")

Later we should have patches that will make sure kernel put page table
and vmemmap on local node ram instead of push them down to node0. Also
need to find way to put other kernel used ram to local node ram.

Reported-by: Tim Gardner
Reported-by: Don Morris
Bisected-by: Don Morris
Tested-by: Don Morris
Signed-off-by: Yinghai Lu
Cc: Tony Luck
Cc: Thomas Renninger
Cc: Tejun Heo
Cc: Tang Chen
Cc: Yasuaki Ishimatsu
Signed-off-by: Linus Torvalds

Yinghai Lu
2013-03-03 01:34:39 +0800

02 Mar, 2013

1 commit

26567cdbb fix nommu breakage in shmem.c ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-03-02 12:50:45 +0800

01 Mar, 2013

2 commits

de1a2262b Merge tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux ... Browse Code »

Pull writeback fixes from Wu Fengguang:
"Two writeback fixes

- fix negative (setpoint - dirty) in 32bit archs

- use down_read_trylock() in writeback_inodes_sb(_nr)_if_idle()"

* tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
Negative (setpoint-dirty) in bdi_position_ratio()
vfs: re-implement writeback_inodes_sb(_nr)_if_idle() and rename them

Linus Torvalds
2013-03-01 05:21:44 +0800
ee89f8125 Merge branch 'for-3.9/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block IO core bits from Jens Axboe:
"Below are the core block IO bits for 3.9. It was delayed a few days
since my workstation kept crashing every 2-8h after pulling it into
current -git, but turns out it is a bug in the new pstate code (divide
by zero, will report separately). In any case, it contains:

- The big cfq/blkcg update from Tejun and and Vivek.

- Additional block and writeback tracepoints from Tejun.

- Improvement of the should sort (based on queues) logic in the plug
flushing.

- _io() variants of the wait_for_completion() interface, using
io_schedule() instead of schedule() to contribute to io wait
properly.

- Various little fixes.

You'll get two trivial merge conflicts, which should be easy enough to
fix up"

Fix up the trivial conflicts due to hlist traversal cleanups (commit
b67bfe0d42ca: "hlist: drop the node parameter from iterators").

* 'for-3.9/core' of git://git.kernel.dk/linux-block: (39 commits)
block: remove redundant check to bd_openers()
block: use i_size_write() in bd_set_size()
cfq: fix lock imbalance with failed allocations
drivers/block/swim3.c: fix null pointer dereference
block: don't select PERCPU_RWSEM
block: account iowait time when waiting for completion of IO request
sched: add wait_for_completion_io[_timeout]
writeback: add more tracepoints
block: add block_{touch|dirty}_buffer tracepoint
buffer: make touch_buffer() an exported function
block: add @req to bio_{front|back}_merge tracepoints
block: add missing block_bio_complete() tracepoint
block: Remove should_sort judgement when flush blk_plug
block,elevator: use new hashtable implementation
cfq-iosched: add hierarchical cfq_group statistics
cfq-iosched: collect stats from dead cfqgs
cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats()
blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock
block: RCU free request_queue
blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge()
...

Linus Torvalds
2013-03-01 04:52:24 +0800

28 Feb, 2013

4 commits

b67bfe0d4 hlist: drop the node parameter from iterators ... Browse Code »

I'm not sure why, but the hlist for each entry iterators were conceived

list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin
Acked-by: Paul E. McKenney
Signed-off-by: Sasha Levin
Cc: Wu Fengguang
Cc: Marcelo Tosatti
Cc: Gleb Natapov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sasha Levin
2013-02-28 11:10:24 +0800
887cbce0a arch Kconfig: centralise CONFIG_ARCH_NO_VIRT_TO_BUS ... Browse Code »

Change it to CONFIG_HAVE_VIRT_TO_BUS and set it in all architecures
that already provide virt_to_bus().

Signed-off-by: Stephen Rothwell
Reviewed-by: James Hogan
Cc: Bjorn Helgaas
Cc: H Hartley Sweeten
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: "David S. Miller"
Cc: Paul Mundt
Cc: Vineet Gupta
Cc: James Bottomley
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Rothwell
2013-02-28 11:10:23 +0800
ff6a6da60 mm: accelerate munlock() treatment of THP pages ... Browse Code »

munlock_vma_pages_range() was always incrementing addresses by PAGE_SIZE
at a time. When munlocking THP pages (or the huge zero page), this
resulted in taking the mm->page_table_lock 512 times in a row.

We can do better by making use of the page_mask returned by
follow_page_mask (for the huge zero page case), or the size of the page
munlock_vma_page() operated on (for the true THP page case).

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2013-02-28 11:10:09 +0800
098849643 mm: do not grow the stack vma just because of an overrun on preceding vma ... Browse Code »

The stack vma is designed to grow automatically (marked with VM_GROWSUP
or VM_GROWSDOWN depending on architecture) when an access is made beyond
the existing boundary. However, particularly if you have not limited
your stack at all ("ulimit -s unlimited"), this can cause the stack to
grow even if the access was really just one past *another* segment.

And that's wrong, especially since we first grow the segment, but then
immediately later enforce the stack guard page on the last page of the
segment. So _despite_ first growing the stack segment as a result of
the access, the kernel will then make the access cause a SIGSEGV anyway!

So do the same logic as the guard page check does, and consider an
access to within one page of the next segment to be a bad access, rather
than growing the stack to abut the next segment.

Reported-and-tested-by: Heiko Carstens
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-02-28 00:36:04 +0800

27 Feb, 2013

1 commit

d895cb1af Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs pile (part one) from Al Viro:
"Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
locking violations, etc.

The most visible changes here are death of FS_REVAL_DOT (replaced with
"has ->d_weak_revalidate()") and a new helper getting from struct file
to inode. Some bits of preparation to xattr method interface changes.

Misc patches by various people sent this cycle *and* ocfs2 fixes from
several cycles ago that should've been upstream right then.

PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
saner proc_get_inode() calling conventions
proc: avoid extra pde_put() in proc_fill_super()
fs: change return values from -EACCES to -EPERM
fs/exec.c: make bprm_mm_init() static
ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
ocfs2: fix possible use-after-free with AIO
ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
target: writev() on single-element vector is pointless
export kernel_write(), convert open-coded instances
fs: encode_fh: return FILEID_INVALID if invalid fid_type
kill f_vfsmnt
vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
nfsd: handle vfs_getattr errors in acl protocol
switch vfs_getattr() to struct path
default SET_PERSONALITY() in linux/elf.h
ceph: prepopulate inodes only when request is aborted
d_hash_and_lookup(): export, switch open-coded instances
9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
9p: split dropping the acls from v9fs_set_create_acl()
...

Linus Torvalds
2013-02-27 12:16:07 +0800

26 Feb, 2013

4 commits

94e07a759 fs: encode_fh: return FILEID_INVALID if invalid fid_type ... Browse Code »

This patch is a follow up on below patch:

[PATCH] exportfs: add FILEID_INVALID to indicate invalid fid_type
commit: 216b6cbdcbd86b1db0754d58886b466ae31f5a63

Signed-off-by: Namjae Jeon
Signed-off-by: Vivek Trivedi
Acked-by: Steven Whitehouse
Acked-by: Sage Weil
Signed-off-by: Al Viro

Namjae Jeon
2013-02-26 15:46:10 +0800
3451538a1 shmem_setup_file(): use d_alloc_pseudo() instead of d_alloc() ... Browse Code »

Note that provided ->d_dname() reproduces what we used to get for
those guys in e.g. /proc/self/maps; it might be a good idea to change
that to something less ugly, but for now let's keep the existing
user-visible behaviour

Signed-off-by: Al Viro

Al Viro
2013-02-26 15:43:22 +0800
94f2f1423 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull user namespace and namespace infrastructure changes from Eric W Biederman:
"This set of changes starts with a few small enhnacements to the user
namespace. reboot support, allowing more arbitrary mappings, and
support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
user namespace root.

I do my best to document that if you care about limiting your
unprivileged users that when you have the user namespace support
enabled you will need to enable memory control groups.

There is a minor bug fix to prevent overflowing the stack if someone
creates way too many user namespaces.

The bulk of the changes are a continuation of the kuid/kgid push down
work through the filesystems. These changes make using uids and gids
typesafe which ensures that these filesystems are safe to use when
multiple user namespaces are in use. The filesystems converted for
3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The
changes for these filesystems were a little more involved so I split
the changes into smaller hopefully obviously correct changes.

XFS is the only filesystem that remains. I was hoping I could get
that in this release so that user namespace support would be enabled
with an allyesconfig or an allmodconfig but it looks like the xfs
changes need another couple of days before it they are ready."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
cifs: Enable building with user namespaces enabled.
cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
cifs: Convert struct cifs_sb_info to use kuids and kgids
cifs: Modify struct smb_vol to use kuids and kgids
cifs: Convert struct cifsFileInfo to use a kuid
cifs: Convert struct cifs_fattr to use kuid and kgids
cifs: Convert struct tcon_link to use a kuid.
cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
cifs: Convert from a kuid before printing current_fsuid
cifs: Use kuids and kgids SID to uid/gid mapping
cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
cifs: Override unmappable incoming uids and gids
nfsd: Enable building with user namespaces enabled.
nfsd: Properly compare and initialize kuids and kgids
nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
nfsd: Modify nfsd4_cb_sec to use kuids and kgids
nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
nfsd: Convert nfsxdr to use kuids and kgids
nfsd: Convert nfs3xdr to use kuids and kgids
...

Linus Torvalds
2013-02-26 08:00:49 +0800
9043a2650 Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux ... Browse Code »

Pull module update from Rusty Russell:
"The sweeping change is to make add_taint() explicitly indicate whether
to disable lockdep, but it's a mechanical change."

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
MODSIGN: Add option to not sign modules during modules_install
MODSIGN: Add -s option to sign-file
MODSIGN: Specify the hash algorithm on sign-file command line
MODSIGN: Simplify Makefile with a Kconfig helper
module: clean up load_module a little more.
modpost: Ignore ARC specific non-alloc sections
module: constify within_module_*
taint: add explicit flag to show whether lock dep is still OK.
module: printk message when module signature fail taints kernel.

Linus Torvalds
2013-02-26 07:41:43 +0800

24 Feb, 2013

17 commits

ef53d16cd ksm: allocate roots when needed ... Browse Code »

It is a pity to have MAX_NUMNODES+MAX_NUMNODES tree roots statically
allocated, particularly when very few users will ever actually tune
merge_across_nodes 0 to use more than 1+1 of those trees. Not a big
deal (only 16kB wasted on each machine with CONFIG_MAXSMP), but a pity.

Start off with 1+1 statically allocated, then if merge_across_nodes is
ever tuned, allocate for nr_node_ids+nr_node_ids. Do not attempt to
free up the extra if it's tuned back, that would be a waste of effort.

Signed-off-by: Hugh Dickins
Cc: Mel Gorman
Cc: Petr Holasek
Cc: Andrea Arcangeli
Cc: Izik Eidus
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2013-02-24 09:50:24 +0800
56f31801c mm: cleanup "swapcache" in do_swap_page ... Browse Code »

I dislike the way in which "swapcache" gets used in do_swap_page():
there is always a page from swapcache there (even if maybe uncached by
the time we lock it), but tests are made according to "swapcache".
Rework that with "page != swapcache", as has been done in unuse_pte().

Signed-off-by: Hugh Dickins
Cc: Mel Gorman
Cc: Petr Holasek
Cc: Andrea Arcangeli
Cc: Izik Eidus
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2013-02-24 09:50:24 +0800
9e16b7fb1 mm,ksm: swapoff might need to copy ... Browse Code »

Before establishing that KSM page migration was the cause of my
WARN_ON_ONCE(page_mapped(page))s, I suspected that they came from the
lack of a ksm_might_need_to_copy() in swapoff's unuse_pte() - which in
many respects is equivalent to faulting in a page.

In fact I've never caught that as the cause: but in theory it does at
least need the KSM_RUN_UNMERGE check in ksm_might_need_to_copy(), to
avoid bringing a KSM page back in when it's not supposed to be.

I intended to copy how it's done in do_swap_page(), but have a strong
aversion to how "swapcache" ends up being used there: rework it with
"page != swapcache".

Signed-off-by: Hugh Dickins
Cc: Mel Gorman
Cc: Petr Holasek
Cc: Andrea Arcangeli
Cc: Izik Eidus
Acked-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2013-02-24 09:50:23 +0800
5117b3b83 mm,ksm: FOLL_MIGRATION do migration_entry_wait ... Browse Code »

In "ksm: remove old stable nodes more thoroughly" I said that I'd never
seen its WARN_ON_ONCE(page_mapped(page)). True at the time of writing,
but it soon appeared once I tried fuller tests on the whole series.

It turned out to be due to the KSM page migration itself: unmerge_and_
remove_all_rmap_items() failed to locate and replace all the KSM pages,
because of that hiatus in page migration when old pte has been replaced
by migration entry, but not yet by new pte. follow_page() finds no page
at that instant, but a KSM page reappears shortly after, without a
fault.

Add FOLL_MIGRATION flag, so follow_page() can do migration_entry_wait()
for KSM's break_cow(). I'd have preferred to avoid another flag, and do
it every time, in case someone else makes the same easy mistake; but did
not find another transgressor (the common get_user_pages() is of course
safe), and cannot be sure that every follow_page() caller is prepared to
sleep - ia64's xencomm_vtop()? Now, THP's wait_split_huge_page() can
already sleep there, since anon_vma locking was changed to mutex, but
maybe that's somehow excluded.

Signed-off-by: Hugh Dickins
Cc: Mel Gorman
Cc: Petr Holasek
Cc: Andrea Arcangeli
Cc: Izik Eidus
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2013-02-24 09:50:23 +0800
bc56620b4 ksm: shrink 32-bit rmap_item back to 32 bytes ... Browse Code »

Think of struct rmap_item as an extension of struct page (restricted to
MADV_MERGEABLE areas): there may be a lot of them, we need to keep them
small, especially on 32-bit architectures of limited lowmem.

Siting "int nid" after "unsigned int checksum" works nicely on 64-bit,
making no change to its 64-byte struct rmap_item; but bloats the 32-bit
struct rmap_item from (nicely cache-aligned) 32 bytes to 36 bytes, which
rounds up to 40 bytes once allocated from slab. We'd better avoid that.

Hey, I only just remembered that the anon_vma pointer in struct
rmap_item has no purpose until the rmap_item is hung from a stable tree
node (which has its own nid field); and rmap_item's nid field no purpose
than to say which tree root to tell rb_erase() when unlinking from an
unstable tree.

Double them up in a union. There's just one place where we set anon_vma
early (when we already hold mmap_sem): now we must remove tree_rmap_item
from its unstable tree there, before overwriting nid. No need to
spatter BUG()s around: we'd be seeing oopses if this were wrong.

Signed-off-by: Hugh Dickins
Cc: Mel Gorman
Cc: Petr Holasek
Cc: Andrea Arcangeli
Cc: Izik Eidus
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2013-02-24 09:50:23 +0800
b599cbdf1 ksm: treat unstable nid like in stable tree ... Browse Code »

An inconsistency emerged in reviewing the NUMA node changes to KSM: when
meeting a page from the wrong NUMA node in a stable tree, we say that
it's okay for comparisons, but not as a leaf for merging; whereas when
meeting a page from the wrong NUMA node in an unstable tree, we bail out
immediately.

Now, it might be that a wrong NUMA node in an unstable tree is more
likely to correlate with instablility (different content, with rbnode
now misplaced) than page migration; but even so, we are accustomed to
instablility in the unstable tree.

Without strong evidence for which strategy is generally better, I'd
rather be consistent with what's done in the stable tree: accept a page
from the wrong NUMA node for comparison, but not as a leaf for merging.

Signed-off-by: Hugh Dickins
Cc: Mel Gorman
Cc: Petr Holasek
Cc: Andrea Arcangeli
Cc: Izik Eidus
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2013-02-24 09:50:23 +0800
8fdb3dbf0 ksm: add some comments ... Browse Code »

Added slightly more detail to the Documentation of merge_across_nodes, a
few comments in areas indicated by review, and renamed get_ksm_page()'s
argument from "locked" to "lock_it". No functional change.

Signed-off-by: Hugh Dickins
Cc: Mel Gorman
Cc: Petr Holasek
Cc: Andrea Arcangeli
Cc: Izik Eidus
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2013-02-24 09:50:23 +0800
49cd0a5c2 tmpfs: fix mempolicy object leaks ... Browse Code »

Fix several mempolicy leaks in the tmpfs mount logic. These leaks are
slow - on the order of one object leaked per mount attempt.

Leak 1 (umount doesn't free mpol allocated in mount):
while true; do
mount -t tmpfs -o mpol=interleave,size=100M nodev /mnt
umount /mnt
done

Leak 2 (errors parsing remount options will leak mpol):
mount -t tmpfs -o size=100M nodev /mnt
while true; do
mount -o remount,mpol=interleave,size=x /mnt 2> /dev/null
done
umount /mnt

Leak 3 (multiple mpol per mount leak mpol):
while true; do
mount -t tmpfs -o mpol=interleave,mpol=interleave,size=100M nodev /mnt
umount /mnt
done

This patch fixes all of the above. I could have broken the patch into
three pieces but is seemed easier to review as one.

[akpm@linux-foundation.org: fix handling of mpol_parse_str() errors, per Hugh]
Signed-off-by: Greg Thelen
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Thelen
2013-02-24 09:50:23 +0800
5f00110f7 tmpfs: fix use-after-free of mempolicy object ... Browse Code »

The tmpfs remount logic preserves filesystem mempolicy if the mpol=M
option is not specified in the remount request. A new policy can be
specified if mpol=M is given.

Before this patch remounting an mpol bound tmpfs without specifying
mpol= mount option in the remount request would set the filesystem's
mempolicy object to a freed mempolicy object.

To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run:
# mkdir /tmp/x

# mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x

# grep /tmp/x /proc/mounts
nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0

# mount -o remount,size=200M nodev /tmp/x

# grep /tmp/x /proc/mounts
nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0
# note ? garbage in mpol=... output above

# dd if=/dev/zero of=/tmp/x/f count=1
# panic here

Panic:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [< (null)>] (null)
[...]
Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
Call Trace:
mpol_shared_policy_init+0xa5/0x160
shmem_get_inode+0x209/0x270
shmem_mknod+0x3e/0xf0
shmem_create+0x18/0x20
vfs_create+0xb5/0x130
do_last+0x9a1/0xea0
path_openat+0xb3/0x4d0
do_filp_open+0x42/0xa0
do_sys_open+0xfe/0x1e0
compat_sys_open+0x1b/0x20
cstar_dispatch+0x7/0x1f

Non-debug kernels will not crash immediately because referencing the
dangling mpol will not cause a fault. Instead the filesystem will
reference a freed mempolicy object, which will cause unpredictable
behavior.

The problem boils down to a dropped mpol reference below if
shmem_parse_options() does not allocate a new mpol:

config = *sbinfo
shmem_parse_options(data, &config, true)
mpol_put(sbinfo->mpol)
sbinfo->mpol = config.mpol /* BUG: saves unreferenced mpol */

This patch avoids the crash by not releasing the mempolicy if
shmem_parse_options() doesn't create a new mpol.

How far back does this issue go? I see it in both 2.6.36 and 3.3. I did
not look back further.

Signed-off-by: Greg Thelen
Acked-by: Hugh Dickins
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Thelen
2013-02-24 09:50:23 +0800
67d46b296 mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages ... Browse Code »

Rob van der Heij reported the following (paraphrased) on private mail.

The scenario is that I want to avoid backups to fill up the page
cache and purge stuff that is more likely to be used again (this is
with s390x Linux on z/VM, so I don't give it as much memory that
we don't care anymore). So I have something with LD_PRELOAD that
intercepts the close() call (from tar, in this case) and issues
a posix_fadvise() just before closing the file.

This mostly works, except for small files (less than 14 pages)
that remains in page cache after the face.

Unfortunately Rob has not had a chance to test this exact patch but the
test program below should be reproducing the problem he described.

The issue is the per-cpu pagevecs for LRU additions. If the pages are
added by one CPU but fadvise() is called on another then the pages
remain resident as the invalidate_mapping_pages() only drains the local
pagevecs via its call to pagevec_release(). The user-visible effect is
that a program that uses fadvise() properly is not obeyed.

A possible fix for this is to put the necessary smarts into
invalidate_mapping_pages() to globally drain the LRU pagevecs if a
pagevec page could not be discarded. The downside with this is that an
inode cache shrink would send a global IPI and memory pressure
potentially causing global IPI storms is very undesirable.

Instead, this patch adds a check during fadvise(POSIX_FADV_DONTNEED) to
check if invalidate_mapping_pages() discarded all the requested pages.
If a subset of pages are discarded it drains the LRU pagevecs and tries
again. If the second attempt fails, it assumes it is due to the pages
being mapped, locked or dirty and does not care. With this patch, an
application using fadvise() correctly will be obeyed but there is a
downside that a malicious application can force the kernel to send
global IPIs and increase overhead.

If accepted, I would like this to be considered as a -stable candidate.
It's not an urgent issue but it's a system call that is not working as
advertised which is weak.

The following test program demonstrates the problem. It should never
report that pages are still resident but will without this patch. It
assumes that CPU 0 and 1 exist.

int main() {
int fd;
int pagesize = getpagesize();
ssize_t written = 0, expected;
char *buf;
unsigned char *vec;
int resident, i;
cpu_set_t set;

/* Prepare a buffer for writing */
expected = FILESIZE_PAGES * pagesize;
buf = malloc(expected + 1);
if (buf == NULL) {
printf("ENOMEM\n");
exit(EXIT_FAILURE);
}
buf[expected] = 0;
memset(buf, 'a', expected);

/* Prepare the mincore vec */
vec = malloc(FILESIZE_PAGES);
if (vec == NULL) {
printf("ENOMEM\n");
exit(EXIT_FAILURE);
}

/* Bind ourselves to CPU 0 */
CPU_ZERO(&set);
CPU_SET(0, &set);
if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
perror("sched_setaffinity");
exit(EXIT_FAILURE);
}

/* open file, unlink and write buffer */
fd = open("fadvise-test-file", O_CREAT|O_EXCL|O_RDWR);
if (fd == -1) {
perror("open");
exit(EXIT_FAILURE);
}
unlink("fadvise-test-file");
while (written < expected) {
ssize_t this_write;
this_write = write(fd, buf + written, expected - written);

if (this_write == -1) {
perror("write");
exit(EXIT_FAILURE);
}

written += this_write;
}
free(buf);

/*
* Force ourselves to another CPU. If fadvise only flushes the local
* CPUs pagevecs then the fadvise will fail to discard all file pages
*/
CPU_ZERO(&set);
CPU_SET(1, &set);
if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
perror("sched_setaffinity");
exit(EXIT_FAILURE);
}

/* sync and fadvise to discard the page cache */
fsync(fd);
if (posix_fadvise(fd, 0, expected, POSIX_FADV_DONTNEED) == -1) {
perror("posix_fadvise");
exit(EXIT_FAILURE);
}

/* map the file and use mincore to see which parts of it are resident */
buf = mmap(NULL, expected, PROT_READ, MAP_SHARED, fd, 0);
if (buf == NULL) {
perror("mmap");
exit(EXIT_FAILURE);
}
if (mincore(buf, expected, vec) == -1) {
perror("mincore");
exit(EXIT_FAILURE);
}

/* Check residency */
for (i = 0, resident = 0; i < FILESIZE_PAGES; i++) {
if (vec[i])
resident++;
}
if (resident != 0) {
printf("Nr unexpected pages resident: %d\n", resident);
exit(EXIT_FAILURE);
}

munmap(buf, expected);
close(fd);
free(vec);
exit(EXIT_SUCCESS);
}

Signed-off-by: Mel Gorman
Reported-by: Rob van der Heij
Tested-by: Rob van der Heij
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2013-02-24 09:50:23 +0800
fa794199e mm: export mmu notifier invalidates ... Browse Code »

We at SGI have a need to address some very high physical address ranges
with our GRU (global reference unit), sometimes across partitioned
machine boundaries and sometimes with larger addresses than the cpu
supports. We do this with the aid of our own 'extended vma' module
which mimics the vma. When something (either unmap or exit) frees an
'extended vma' we use the mmu notifiers to clean them up.

We had been able to mimic the functions
__mmu_notifier_invalidate_range_start() and
__mmu_notifier_invalidate_range_end() by locking the per-mm lock and
walking the per-mm notifier list. But with the change to a global srcu
lock (static in mmu_notifier.c) we can no longer do that. Our module has
no access to that lock.

So we request that these two functions be exported.

Signed-off-by: Cliff Wickman
Acked-by: Robin Holt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cliff Wickman
2013-02-24 09:50:23 +0800
240aadeed mm: accelerate mm_populate() treatment of THP pages ... Browse Code »

This change adds a follow_page_mask function which is equivalent to
follow_page, but with an extra page_mask argument.

follow_page_mask sets *page_mask to HPAGE_PMD_NR - 1 when it encounters
a THP page, and to 0 in other cases.

__get_user_pages() makes use of this in order to accelerate populating
THP ranges - that is, when both the pages and vmas arrays are NULL, we
don't need to iterate HPAGE_PMD_NR times to cover a single THP page (and
we also avoid taking mm->page_table_lock that many times).

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2013-02-24 09:50:23 +0800
28a35716d mm: use long type for page counts in mm_populate() and get_user_pages() ... Browse Code »

Use long type for page counts in mm_populate() so as to avoid integer
overflow when running the following test code:

int main(void) {
void *p = mmap(NULL, 0x100000000000, PROT_READ,
MAP_PRIVATE | MAP_ANON, -1, 0);
printf("p: %p\n", p);
mlockall(MCL_CURRENT);
printf("done\n");
return 0;
}

Signed-off-by: Michel Lespinasse
Cc: Andrea Arcangeli
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michel Lespinasse
2013-02-24 09:50:22 +0800
e0fb58152 mm: accurately document nr_free_*_pages functions with code comments ... Browse Code »

nr_free_zone_pages(), nr_free_buffer_pages() and nr_free_pagecache_pages()
are horribly badly named, so accurately document them with code comments
in case of the misuse of them.

[akpm@linux-foundation.org: tweak comments]
Reviewed-by: Randy Dunlap
Signed-off-by: Zhang Yanfei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhang Yanfei
2013-02-24 09:50:22 +0800
5f4b9fc5c HWPOISON: change order of error_states[]'s elements ... Browse Code »

error_states[] has two separate states "unevictable LRU page" and
"mlocked LRU page", and the former one has the higher priority now. But
because of that the latter one is rarely chosen because pages with
PageMlocked highly likely have PG_unevictable set. On the other hand,
PG_unevictable without PageMlocked is common for ramfs or SHM_LOCKed
shared memory, so reversing the priority of these two states helps us
clearly distinguish them.

Signed-off-by: Naoya Horiguchi
Cc: Andi Kleen
Cc: Chen Gong
Cc: Tony Luck
Cc: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2013-02-24 09:50:22 +0800
524fca1e7 HWPOISON: fix misjudgement of page_action() for errors on mlocked pages ... Browse Code »

memory_failure() can't handle memory errors on mlocked pages correctly,
because page_action() judges such errors as ones on "unknown pages"
instead of ones on "unevictable LRU page" or "mlocked LRU page". In
order to determine page_state page_action() checks page flags at the
timing of the judgement, but such page flags are not the same with those
just after memory_failure() is called, because memory_failure() does
unmapping of the error pages before doing page_action(). This unmapping
changes the page state, especially page_remove_rmap() (called from
try_to_unmap_one()) clears PG_mlocked, so page_action() can't catch
mlocked pages after that.

With this patch, we store the page flag of the error page before doing
unmap, and (only) if the first check with page flags at the time decided
the error page is unknown, we do the second check with the stored page
flag. This implementation doesn't change error handling for the page
types for which the first check can determine the page state correctly.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Naoya Horiguchi
Cc: Andi Kleen
Cc: Tony Luck
Cc: Chen Gong
Cc: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2013-02-24 09:50:22 +0800
6d0439904 memcg: stop warning on memcg_propagate_kmem ... Browse Code »

Whilst I run the risk of a flogging for disloyalty to the Lord of Sealand,
I do have CONFIG_MEMCG=y CONFIG_MEMCG_KMEM not set, and grow tired of the
"mm/memcontrol.c:4972:12: warning: `memcg_propagate_kmem' defined but not
used [-Wunused-function]" seen in 3.8-rc: move the #ifdef outwards.

Signed-off-by: Hugh Dickins
Acked-by: Michal Hocko
Cc: Glauber Costa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2013-02-24 09:50:22 +0800