Eric Lee / smarc-fsl-linux-kernel

30 Mar, 2011

1 commit

eefbab599 Merge branch 'frv' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-frv ... Browse Code »

* 'frv' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-frv:
FRV: Use generic show_interrupts()
FRV: Convert genirq namespace
frv: Select GENERIC_HARDIRQS_NO_DEPRECATED
frv: Convert cpu irq_chip to new functions
frv: Convert mb93493 irq_chip to new functions
frv: Convert mb93093 irq_chip to new function
frv: Convert mb93091 irq_chip to new functions
frv: Fix typo from __do_IRQ overhaul
frv: Remove stale irq_chip.end
FRV: Do some cleanups
FRV: Missing node arg in alloc_thread_info_node() macro
NOMMU: implement access_remote_vm
NOMMU: support SMP dynamic percpu_alloc
NOMMU: percpu should use is_vmalloc_addr().

Linus Torvalds
2011-03-30 02:43:30 +0800

29 Mar, 2011

1 commit

f55f199b7 NOMMU: implement access_remote_vm ... Browse Code »

Recent vm changes brought in a new function which the core procfs code
utilizes. So implement it for nommu systems too to avoid link failures.

Signed-off-by: Mike Frysinger
Signed-off-by: David Howells
Tested-by: Simon Horman
Tested-by: Ithamar Adema
Acked-by: Greg Ungerer

Mike Frysinger
2011-03-29 21:05:12 +0800

28 Mar, 2011

2 commits

eac522ef4 NOMMU: percpu should use is_vmalloc_addr(). ... Browse Code »

per_cpu_ptr_to_phys() uses VMALLOC_START and VMALLOC_END to determine if an
address is in the vmalloc() region or not. This is incorrect on NOMMU as
there is no real vmalloc() capability (vmalloc() is emulated by kmalloc()).

The correct way to do this is to use is_vmalloc_addr(). This encapsulates the
vmalloc() region test in MMU mode and just returns 0 in NOMMU mode.

On FRV in NOMMU mode, the percpu compilation fails without this patch:

mm/percpu.c: In function 'per_cpu_ptr_to_phys':
mm/percpu.c:1011: error: 'VMALLOC_START' undeclared (first use in this function)
mm/percpu.c:1011: error: (Each undeclared identifier is reported only once
mm/percpu.c:1011: error: for each function it appears in.)
mm/percpu.c:1012: error: 'VMALLOC_END' undeclared (first use in this function)
mm/percpu.c:1018: warning: control reaches end of non-void function

Signed-off-by: David Howells

David Howells
2011-03-28 19:53:29 +0800
ae91dbfc9 mm: fix memory.c incorrect kernel-doc ... Browse Code »

Fix mm/memory.c incorrect kernel-doc function notation:

Warning(mm/memory.c:3718): Cannot understand * @access_remote_vm - access another process' address space
on line 3718 - I thought it was a doc line

Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Randy Dunlap
2011-03-28 10:30:18 +0800

25 Mar, 2011

8 commits

d39dd11c3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
fs: simplify iget & friends
fs: pull inode->i_lock up out of writeback_single_inode
fs: rename inode_lock to inode_hash_lock
fs: move i_wb_list out from under inode_lock
fs: move i_sb_list out from under inode_lock
fs: remove inode_lock from iput_final and prune_icache
fs: Lock the inode LRU list separately
fs: factor inode disposal
fs: protect inode->i_state with inode->i_lock
autofs4: Do not potentially dereference NULL pointer returned by fget() in autofs_dev_ioctl_setpipefd()
autofs4 - remove autofs4_lock
autofs4 - fix d_manage() return on rcu-walk
autofs4 - fix autofs4_expire_indirect() traversal
autofs4 - fix dentry leak in autofs4_expire_direct()
autofs4 - reinstate last used update on access
vfs - check non-mountpoint dentry might block in __follow_mount_rcu()

Linus Torvalds
2011-03-25 10:01:30 +0800
a66979aba fs: move i_wb_list out from under inode_lock ... Browse Code »

Protect the inode writeback list with a new global lock
inode_wb_list_lock and use it to protect the list manipulations and
traversals. This lock replaces the inode_lock as the inodes on the
list can be validity checked while holding the inode->i_lock and
hence the inode_lock is no longer needed to protect the list.

Signed-off-by: Dave Chinner
Signed-off-by: Al Viro

Dave Chinner
2011-03-25 09:17:51 +0800
250df6ed2 fs: protect inode->i_state with inode->i_lock ... Browse Code »

Protect inode state transitions and validity checks with the
inode->i_lock. This enables us to make inode state transitions
independently of the inode_lock and is the first step to peeling
away the inode_lock from the code.

This requires that __iget() is done atomically with i_state checks
during list traversals so that we don't race with another thread
marking the inode I_FREEING between the state check and grabbing the
reference.

Also remove the unlock_new_inode() memory barrier optimisation
required to avoid taking the inode_lock when clearing I_NEW.
Simplify the code by simply taking the inode->i_lock around the
state change and wakeup. Because the wakeup is no longer tricky,
remove the wake_up_inode() function and open code the wakeup where
necessary.

Signed-off-by: Dave Chinner
Signed-off-by: Al Viro

Dave Chinner
2011-03-25 09:16:31 +0800
a73514025 Merge branch 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 ... Browse Code »

* 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
SLUB: Write to per cpu data when allocating it
slub: Fix debugobjects with lockless fastpath

Linus Torvalds
2011-03-25 08:51:12 +0800
b2b755b5f lib, arch: add filter argument to show_mem and fix private implementations ... Browse Code »

Commit ddd588b5dd55 ("oom: suppress nodes that are not allowed from
meminfo on oom kill") moved lib/show_mem.o out of lib/lib.a, which
resulted in build warnings on all architectures that implement their own
versions of show_mem():

lib/lib.a(show_mem.o): In function `show_mem':
show_mem.c:(.text+0x1f4): multiple definition of `show_mem'
arch/sparc/mm/built-in.o:(.text+0xd70): first defined here

The fix is to remove __show_mem() and add its argument to show_mem() in
all implementations to prevent this breakage.

Architectures that implement their own show_mem() actually don't do
anything with the argument yet, but they could be made to filter nodes
that aren't allowed in the current context in the future just like the
generic implementation.

Reported-by: Stephen Rothwell
Reported-by: James Bottomley
Suggested-by: Andrew Morton
Signed-off-by: David Rientjes
Signed-off-by: Linus Torvalds

David Rientjes
2011-03-25 08:49:37 +0800
b8c4c96ed SLUB: Write to per cpu data when allocating it ... Browse Code »

It turns out that the cmpxchg16b emulation has to access vmalloced
percpu memory with interrupts disabled. If the memory has never
been touched before then the fault necessary to establish the
mapping will not to occur and the kernel will fail on boot.

Fix that by reusing the CONFIG_PREEMPT code that writes the
cpu number into a field on every cpu. Writing to the per cpu
area before causes the mapping to be established before we get
to a cmpxchg16b emulation.

Tested-by: Ingo Molnar
Signed-off-by: Christoph Lameter
Signed-off-by: Pekka Enberg

Christoph Lameter
2011-03-25 03:53:07 +0800
f9b615de4 slub: Fix debugobjects with lockless fastpath ... Browse Code »

On Thu, 24 Mar 2011, Ingo Molnar wrote:
> RIP: 0010:[] [] get_next_timer_interrupt+0x119/0x260

That's a typical timer crash, but you were unable to debug it with
debugobjects because commit d3f661d6 broke those.

Cc: Christoph Lameter
Tested-by: Ingo Molnar
Signed-off-by: Thomas Gleixner
Signed-off-by: Pekka Enberg

Thomas Gleixner
2011-03-25 03:26:46 +0800
6c5103890 Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
Documentation/iostats.txt: bit-size reference etc.
cfq-iosched: removing unnecessary think time checking
cfq-iosched: Don't clear queue stats when preempt.
blk-throttle: Reset group slice when limits are changed
blk-cgroup: Only give unaccounted_time under debug
cfq-iosched: Don't set active queue in preempt
block: fix non-atomic access to genhd inflight structures
block: attempt to merge with existing requests on plug flush
block: NULL dereference on error path in __blkdev_get()
cfq-iosched: Don't update group weights when on service tree
fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
block: Require subsystems to explicitly allocate bio_set integrity mempool
jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
fs: make fsync_buffers_list() plug
mm: make generic_writepages() use plugging
blk-cgroup: Add unaccounted time to timeslice_used.
block: fixup plugging stubs for !CONFIG_BLOCK
block: remove obsolete comments for blkdev_issue_zeroout.
blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
...

Fix up conflicts in fs/{aio.c,super.c}

Linus Torvalds
2011-03-25 01:16:26 +0800

24 Mar, 2011

28 commits

b81a618dc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
deal with races in /proc/*/{syscall,stack,personality}
proc: enable writing to /proc/pid/mem
proc: make check_mem_permission() return an mm_struct on success
proc: hold cred_guard_mutex in check_mem_permission()
proc: disable mem_write after exec
mm: implement access_remote_vm
mm: factor out main logic of access_process_vm
mm: use mm_struct to resolve gate vma's in __get_user_pages
mm: arch: rename in_gate_area_no_task to in_gate_area_no_mm
mm: arch: make in_gate_area take an mm_struct instead of a task_struct
mm: arch: make get_gate_vma take an mm_struct instead of a task_struct
x86: mark associated mm when running a task in 32 bit compatibility mode
x86: add context tag to mark mm when running a task in 32-bit compatibility mode
auxv: require the target to be tracable (or yourself)
close race in /proc/*/environ
report errors in /proc/*/*map* sanely
pagemap: close races with suid execve
make sessionid permissions in /proc/*/task/* match those in /proc/*
fix leaks in path_lookupat()

Fix up trivial conflicts in fs/proc/base.c

Linus Torvalds
2011-03-24 11:51:42 +0800
93a72052b crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setu… ... Browse Code »
43

…p_elfcorehdr and saved_max_pfn

The Xen PV drivers in a crashed HVM guest can not connect to the dom0
backend drivers because both frontend and backend drivers are still in
connected state. To run the connection reset function only in case of a
crashdump, the is_kdump_kernel() function needs to be available for the PV
driver modules.

Consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn into
kernel/crash_dump.c Also export elfcorehdr_addr to make is_kdump_kernel()
usable for modules.

Leave 'elfcorehdr' as early_param(). This changes powerpc from __setup()
to early_param(). It adds an address range check from x86 also on ia64
and powerpc.

[akpm@linux-foundation.org: additional #includes]
[akpm@linux-foundation.org: remove elfcorehdr_addr export]
[akpm@linux-foundation.org: fix for Tejun's mm/nobootmem.c changes]
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Olaf Hering
2011-03-24 10:47:19 +0800
f9434ad15 memcg: give current access to memory reserves if it's trying to die ... Browse Code »

When a memcg is oom and current has already received a SIGKILL, then give
it access to memory reserves with a higher scheduling priority so that it
may quickly exit and free its memory.

This is identical to the global oom killer and is done even before
checking for panic_on_oom: a pending SIGKILL here while panic_on_oom is
selected is guaranteed to have come from userspace; the thread only needs
access to memory reserves to exit and thus we don't unnecessarily panic
the machine until the kernel has no last resort to free memory.

Signed-off-by: David Rientjes
Cc: Balbir Singh
Cc: Daisuke Nishimura
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2011-03-24 10:46:33 +0800
5a6475a4e memcg: fix leak on wrong LRU with FUSE ... Browse Code »

fs/fuse/dev.c::fuse_try_move_page() does

(1) remove a page by ->steal()
(2) re-add the page to page cache
(3) link the page to LRU if it was not on LRU at (1)

This implies the page is _on_ LRU when it's added to radix-tree. So, the
page is added to memory cgroup while it's on LRU. because LRU is lazy and
no one flushs it.

This is the same behavior as SwapCache and needs special care as
- remove page from LRU before overwrite pc->mem_cgroup.
- add page to LRU after overwrite pc->mem_cgroup.

And we need to taking care of pagevec.

If PageLRU(page) is set before we add PCG_USED bit, the page will not be
added to memcg's LRU (in short period). So, regardlress of PageLRU(page)
value before commit_charge(), we need to check PageLRU(page) after
commit_charge().

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=30432

Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Johannes Weiner
Acked-by: Daisuke Nishimura
Cc: Miklos Szeredi
Cc: Balbir Singh
Reported-by: Daniel Poelzleithner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2011-03-24 10:46:33 +0800
6cfddb261 memcg: page_cgroup array is never stored on reserved pages ... Browse Code »

KAMEZAWA Hiroyuki noted that free_pages_cgroup doesn't have to check for
PageReserved because we never store the array on reserved pages (neither
alloc_pages_exact nor vmalloc use those pages).

So we can replace the check by a BUG_ON.

Signed-off-by: Michal Hocko
Acked-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-03-24 10:46:33 +0800
dde79e005 page_cgroup: reduce allocation overhead for page_cgroup array for CONFIG_SPARSEMEM ... Browse Code »

Currently we are allocating a single page_cgroup array per memory section
(stored in mem_section->base) when CONFIG_SPARSEMEM is selected. This is
correct but memory inefficient solution because the allocated memory
(unless we fall back to vmalloc) is not kmalloc friendly:

- 32b - 16384 entries (20B per entry) fit into 327680B so the
524288B slab cache is used
- 32b with PAE - 131072 entries with 2621440B fit into 4194304B
- 64b - 32768 entries (40B per entry) fit into 2097152 cache

This is ~37% wasted space per memory section and it sumps up for the whole
memory. On a x86_64 machine it is something like 6MB per 1GB of RAM.

We can reduce the internal fragmentation by using alloc_pages_exact which
allocates PAGE_SIZE aligned blocks so we will get down to
Cc: Dave Hansen
Acked-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-03-24 10:46:32 +0800
4be4489fe mm/memcontrol.c: suppress uninitialized-var warning with older gcc's ... Browse Code »

mm/memcontrol.c: In function 'mem_cgroup_force_empty':
mm/memcontrol.c:2280: warning: 'flags' may be used uninitialized in this function

It's a false positive.

Cc: Balbir Singh
Cc: Daisuke Nishimura
Cc: Greg Thelen
Cc: Johannes Weiner
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2011-03-24 10:46:32 +0800
7a159cc9d memcg: use native word page statistics counters ... Browse Code »

The statistic counters are in units of pages, there is no reason to make
them 64-bit wide on 32-bit machines.

Make them native words. Since they are signed, this leaves 31 bit on
32-bit machines, which can represent roughly 8TB assuming a page size of
4k.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Johannes Weiner
Signed-off-by: Greg Thelen
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Balbir Singh
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:31 +0800
e9f8974f2 memcg: break out event counters from other stats ... Browse Code »

For increasing and decreasing per-cpu cgroup usage counters it makes sense
to use signed types, as single per-cpu values might go negative during
updates. But this is not the case for only-ever-increasing event
counters.

All the counters have been signed 64-bit so far, which was enough to count
events even with the sign bit wasted.

This patch:
- divides s64 counters into signed usage counters and unsigned
monotonically increasing event counters.
- converts unsigned event counters into 'unsigned long' rather than
'u64'. This matches the type used by the /proc/vmstat event counters.

The next patch narrows the signed usage counters type (on 32-bit CPUs,
that is).

Signed-off-by: Johannes Weiner
Signed-off-by: Greg Thelen
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Balbir Singh
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:31 +0800
7ec99d621 memcg: unify charge/uncharge quantities to units of pages ... Browse Code »

There is no clear pattern when we pass a page count and when we pass a
byte count that is a multiple of PAGE_SIZE.

We never charge or uncharge subpage quantities, so convert it all to page
counts.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:30 +0800
7ffd4ca7a memcg: convert uncharge batching from bytes to page granularity ... Browse Code »

We never uncharge subpage quantities.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:30 +0800
11c9ea4e8 memcg: convert per-cpu stock from bytes to page granularity ... Browse Code »

We never keep subpage quantities in the per-cpu stock.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:29 +0800
e7018b8d2 memcg: keep only one charge cancelling function ... Browse Code »

We have two charge cancelling functions: one takes a page count, the other
a page size. The second one just divides the parameter by PAGE_SIZE and
then calls the first one. This is trivial, no need for an extra function.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:29 +0800
bf1ff2635 memcg: remove memcg->reclaim_param_lock ... Browse Code »

The reclaim_param_lock is only taken around single reads and writes to
integer variables and is thus superfluous. Drop it.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Reviewed-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:29 +0800
4dc03de1b memcg: charged pages always have valid per-memcg zone info ... Browse Code »

page_cgroup_zoneinfo() will never return NULL for a charged page, remove
the check for it in mem_cgroup_get_reclaim_stat_from_page().

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Reviewed-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:28 +0800
6b3ae58ef memcg: remove direct page_cgroup-to-page pointer ... Browse Code »

In struct page_cgroup, we have a full word for flags but only a few are
reserved. Use the remaining upper bits to encode, depending on
configuration, the node or the section, to enable page_cgroup-to-page
lookups without a direct pointer.

This saves a full word for every page in a system with memory cgroups
enabled.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Minchan Kim
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:28 +0800
5564e88ba memcg: condense page_cgroup-to-page lookup points ... Browse Code »
43

The per-cgroup LRU lists string up 'struct page_cgroup's. To get from
those structures to the page they represent, a lookup is required.
Currently, the lookup is done through a direct pointer in struct
page_cgroup, so a lot of functions down the callchain do this lookup by
themselves instead of receiving the page pointer from their callers.

The next patch removes this pointer, however, and the lookup is no longer
that straight-forward. In preparation for that, this patch only leaves
the non-optional lookups when coming directly from the LRU list and passes
the page down the stack.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:27 +0800
de3638d9c memcg: fold __mem_cgroup_move_account into caller ... Browse Code »

It is one logical function, no need to have it split up.

Also, get rid of some checks from the inner function that ensured the
sanity of the outer function.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Acked-by: Daisuke Nishimura
Cc: Balbir Singh
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:27 +0800
97a6c37b3 memcg: change page_cgroup_zoneinfo signature ... Browse Code »

Instead of passing a whole struct page_cgroup to this function, let it
take only what it really needs from it: the struct mem_cgroup and the
page.

This has the advantage that reading pc->mem_cgroup is now done at the same
place where the ordering rules for this pointer are enforced and
explained.

It is also in preparation for removing the pc->page backpointer.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:26 +0800
ad324e944 memcg: no uncharged pages reach page_cgroup_zoneinfo ... Browse Code »

This patch series removes the direct page pointer from struct page_cgroup,
which saves 20% of per-page memcg memory overhead (Fedora and Ubuntu
enable memcg per default, openSUSE apparently too).

The node id or section number is encoded in the remaining free bits of
pc->flags which allows calculating the corresponding page without the
extra pointer.

I ran, what I think is, a worst-case microbenchmark that just cats a large
sparse file to /dev/null, because it means that walking the LRU list on
behalf of per-cgroup reclaim and looking up pages from page_cgroups is
happening constantly and at a high rate. But it made no measurable
difference. A profile reported a 0.11% share of the new
lookup_cgroup_page() function in this benchmark.

This patch:

All callsites check PCG_USED before passing pc->mem_cgroup, so the latter
is never NULL.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Acked-by: Balbir Singh
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:26 +0800
f212ad7cf memcg: add memcg sanity checks at allocating and freeing pages ... Browse Code »

Add checks at allocating or freeing a page whether the page is used (iow,
charged) from the view point of memcg.

This check may be useful in debugging a problem and we did similar checks
before the commit 52d4b9ac(memcg: allocate all page_cgroup at boot).

This patch adds some overheads at allocating or freeing memory, so it's
enabled only when CONFIG_DEBUG_VM is enabled.

Signed-off-by: Daisuke Nishimura
Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke Nishimura
2011-03-24 10:46:25 +0800
af4a66214 memcg: remove NULL check from lookup_page_cgroup() result ... Browse Code »

The page_cgroup array is set up before even fork is initialized. I
seriously doubt that this code executes before the array is alloc'd.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:25 +0800
c14f35c70 memcg: remove impossible conditional when committing ... Browse Code »

No callsite ever passes a NULL pointer for a struct mem_cgroup * to the
committing function. There is no need to check for it.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:24 +0800
3403968d7 memcg: remove unused page flag bitfield defines ... Browse Code »

These definitions have been unused since '4b3bde4 memcg: remove the
overhead associated with the root cgroup'.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:24 +0800
9d11ea9f1 memcg: simplify the way memory limits are checked ... Browse Code »

Since transparent huge pages, checking whether memory cgroups are below
their limits is no longer enough, but the actual amount of chargeable
space is important.

To not have more than one limit-checking interface, replace
memory_cgroup_check_under_limit() and memory_cgroup_check_margin() with a
single memory_cgroup_margin() that returns the chargeable space and leaves
the comparison to the callsite.

Soft limits are now checked the other way round, by using the already
existing function that returns the amount by which soft limits are
exceeded: res_counter_soft_limit_excess().

Also remove all the corresponding functions on the res_counter side that
are now no longer used.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Acked-by: Balbir Singh
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:23 +0800
b7c616784 memcg: soft limit reclaim should end at limit not below ... Browse Code »

Soft limit reclaim continues until the usage is below the current soft
limit, but the documented semantics are actually that soft limit reclaim
will push usage back until the soft limits are met again.

Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Acked-by: Balbir Singh
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-03-24 10:46:23 +0800
56039efa1 memcg: fix ugly initialization of return value is in caller ... Browse Code »

Remove initialization of vaiable in caller of memory cgroup function.
Actually, it's return value of memcg function but it's initialized in
caller.

Some memory cgroup uses following style to bring the result of start
function to the end function for avoiding races.

mem_cgroup_start_A(&(*ptr))
/* Something very complicated can happen here. */
mem_cgroup_end_A(*ptr)

In some calls, *ptr should be initialized to NULL be caller. But it's
ugly. This patch fixes that *ptr is initialized by _start function.

Signed-off-by: KAMEZAWA Hiroyuki
Acked-by: Johannes Weiner
Acked-by: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2011-03-24 10:46:22 +0800
5ddd36b9c mm: implement access_remote_vm ... Browse Code »

Provide an alternative to access_process_vm that allows the caller to obtain a
reference to the supplied mm_struct.

Signed-off-by: Stephen Wilson
Signed-off-by: Al Viro

Stephen Wilson
2011-03-24 04:36:57 +0800